Data manufacturing frameworks for synthesizing synthetic training data to facilitate training a natural language to logical form model

ABSTRACT

Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a non-provisional application of and claimsbenefit and priority under 35 U.S.C. 119(e) of U.S. ProvisionalApplication No. 63/289,480, filed Dec. 14, 2021, the entire contents ofwhich is incorporated herein by reference for all purposes.

FIELD

The present disclosure generally relates to transforming naturallanguage to Structured Query Language, and more particularly, to datamanufacturing frameworks for synthesizing synthetic training data tofacilitate training a natural language to logical form model.

BACKGROUND

Structured Query Language (SQL) is a domain-specific language used inprogramming and designed for managing data held in a relational databasemanagement system (RDBMS), or for stream processing in a relational datastream management system (RDSMS). It is particularly useful in handlingstructured data (i.e., data incorporating relations among entities andvariables). SQL includes sublanguages such as a data query language(DQL), a data definition language (DDL), a data control language (DCL),and a data manipulation language (DML). The scope of SQL includes dataquery, data manipulation (insert, update, and delete), data definition(schema creation and modification), and data access control. AlthoughSQL is essentially a declarative language (4GL), it also includesprocedural elements. In order to effectively leverage data, RDBMS andRDSMS users are required to not only have prior knowledge about thedatabase schema (e.g., table and column names) but also a workingunderstanding of the syntax and semantics of SQL. Nonetheless, despiteits expressiveness, SQL can often hinder non-technical users fromexploring and making use of their data.

Natural language is an alternative interface to data held or implementedin RDBMS and RDSMS because it allows non-technical users to formulatecomplex questions in a more concise manner than SQL. Using semanticparsing, natural language statements, requests, and questions can betransformed into logical forms or meaning representations that can beexecuted by an application (e.g., model, program, machine, etc.). Forexample, semantic parsing can transform natural language sentencesdirectly into general purpose programming languages such as Python,Java, and SQL. Processes for transforming natural language sentences toSQL queries typically include rule-based, statistical-based, and deeplearning-based systems. Rule-based systems typically use a series offixed rules to translate the natural language sentences to SQL queries.Rule-based systems are generally domain-specific and, thus, areconsidered inelastic and do not generalize well to new use cases (e.g.,across different domains). Statistical-based systems label tokens (i.e.,words or phrases) in an input natural language sentence according totheir semantic role in the sentence and use the labels to fill slots inthe SQL query but have limitations on the types of sentences that can beparsed (e.g., a sentence must be able to be represented as a parsetree). Deep learning-based systems, such as sequence-to-sequence models,involve training deep learning models that directly translate thenatural language sentences to SQL queries and have been shown togeneralize well to new use cases.

BRIEF SUMMARY

Data manufacturing frameworks are disclosed herein for synthesizingsynthetic training data to facilitate training a natural language tological form model.

In some embodiments, a method includes accessing original training data,the original training including a plurality of utterances and aplurality of logical forms, each logical form of the plurality oflogical forms corresponding to at least one utterance of the pluralityof utterances; generating a plurality of templates, each template of theplurality of templates including a delexicalized version of an utterancein the plurality of utterances and a delexicalized version of a logicalform corresponding to the utterance; learning a grammar from theplurality of logical forms, the grammar defining a plurality ofproduction rules for lexicalizing the plurality of templates; generatingsynthetic training data by parsing each template of the plurality oftemplates, sampling a database to identify a plurality of samplingcomponents, a lexicalizing each template of the plurality of templateswith at least one sampling component of the plurality of samplingcomponents; and training a machine learning model with the originaltraining data and the synthetic training data to translate an utteranceto a logical form.

In some embodiments, the plurality of templates is generatedautomatically from the plurality of utterances and the plurality oflogical forms using a machine-learning model.

In some embodiments, the grammar is a synchronous context-free grammar.

In some embodiments, learning the grammar comprises setting one or moretable names, column names, and values in database schema informationincluded in the original training data as non-terminal symbols andgenerating the plurality of production rules by replacing one or morewords, entities, or phrases in the plurality of utterances with the setnon-terminal symbols.

In some embodiments, each template of the plurality of templates isparsed by applying a synchronous context-free grammar to thedelexicalized version of the utterance and the delexicalized version ofthe logical form of respective template to generate an abstract syntaxtree for the respective template.

In some embodiments, lexicalizing each template of the plurality oftemplates with at least one sampling component of the plurality ofsampling components comprises analyzing each template of the pluralityof templates to identify one or more constraints in the respectivetemplate and sampling components of a databased based on the identifiedone or more constraints in respective template.

In some embodiments, the method further includes accessing an utterance;inputting the utterance into the trained machine learning model;translating, using the trained machine learning model, the utteranceinto a logical form; executing the logical form as a query on a databaseto retrieve a result for the query; and outputting the result for theutterance.

In some embodiments, a method includes accessing original training data,the original training data including a plurality of utterances and aplurality of logical forms, each logical form of the plurality oflogical forms corresponding to at least one utterance of the pluralityof utterances; obtaining a pre-trained model trained to translateutterances to logical forms; finetuning the pre-trained model totranslate logical forms to utterances, wherein the finetuning isperformed using the original training data and generates a finetunedmodel; generating a set of delexicalized logical forms, at least onedelexicalized logical form of the set of delexicalized logical formsbeing a delexicalized version of a logical form of the plurality oflogical forms; generating a set of lexicalized logical forms bylexicalizing the set of delexicalized logical forms; generating, by thefinetuned model, synthetic training data comprising an utterance foreach lexicalized logical form of the set of lexicalized logical forms;and training a machine learning model with the original training dataand the synthetic training data to translate an utterance to a logicalform.

In some embodiments, finetuning the pre-trained model comprisesadjusting weights and parameters of the pre-trained model based on theoriginal training data and using one or more machine learningoptimization techniques.

In some embodiments, at least one delexicalized logical form of the setof delexicalized forms in generated automatically using amachine-learning model.

In some embodiments, at least one delexicalized logical form of the setof delexicalized logical forms is generated using a probabilisticcontext-free grammar.

In some embodiments, generating the set of lexicalized logical formscomprises analyzing each delexicalized logical form of the set ofdelexicalized logical forms to identify one or more constraints in therespective delexicalized logical form and sampling components of adatabase based on the identified one or more constraints in respectivedelexicalized logical form.

In some embodiments, generating the synthetic training data comprisestranslating, by the finetuned model, each lexicalized logical form ofthe set of lexicalized logical forms into an utterance.

In some embodiments, the method further includes accessing a naturallanguage utterance; inputting the natural language utterance into thetrained machine learning model; translating, using the trained machinelearning model, the natural language utterance into a logical form;executing the one or more logical forms as a query on a database toretrieve a result for the query; and outputting the result for thenatural language utterance.

In some embodiments, a method includes accessing original training data,the original training data including a plurality of utterances and aplurality of logical forms, each logical form of the plurality oflogical forms corresponding to at least one utterance of the pluralityof utterances; generating a set of abstract syntax trees for theplurality of logical forms; generating a set of delexicalized logicalforms, each delexicalized logical form of the plurality of delexicalizedlogical forms being a delexicalized version of a logical form of theplurality of logical forms; generating a set of lexicalized logicalforms by lexicalizing the set of delexicalized logical forms;generating, by a tree-to-string model, synthetic training datacomprising an utterance for each lexicalized logical form of the set oflexicalized logical forms; and training a machine learning model withthe original training data and the synthetic training data to translatean utterance to a logical form.

In some embodiments, the set of abstract syntax trees are generated forthe plurality of logical forms by parsing each logical form of theplurality of logical forms into an abstract syntax tree and normalizingthe respective abstract syntax tree.

In some embodiments, at least one delexicalized logical form of the setof delexicalized forms in generated automatically using amachine-learning model.

In some embodiments, generating the set of lexicalized logical formscomprises analyzing each delexicalized logical form of the set ofdelexicalized logical forms to identify one or more constraints in therespective delexicalized logical form and sampling components of adatabase based on the identified one or more constraints in respectivedelexicalized logical form.

In some embodiments, the generating the synthetic training datacomprises translating, by the tree-to-string model, each lexicalizedlogical form of the set of lexicalized logical forms into an utterance.

In some embodiments, the generating the synthetic training datacomprises reordering each abstract syntax tree of the set of abstractsyntax trees and decoding each of the reordered abstract syntax treesinto an utterance.

In some embodiments, the method further includes accessing a naturallanguage utterance; inputting the natural language utterance into thetrained machine learning model; translating, using the trained machinelearning model, the natural language utterance into a logical form;executing the one or more logical forms as a query on a database toretrieve a result for the query; and outputting the result for thenatural language utterance.

Some embodiments include a system that includes one or more dataprocessors; and one or more non-transitory computer-readable mediastoring instructions which, when executed by the one or more processors,cause the one or more processors to perform part or all of theoperations and/or methods disclosed herein.

Some embodiments include a computer-program product tangibly embodied inone or more non-transitory machine-readable media, includinginstructions configured to cause one or more data processors to performpart or all of the operations and/or methods disclosed herein.

The techniques described above and below may be implemented in a numberof ways and in a number of contexts. Several example implementations andcontexts are provided with reference to the following figures, asdescribed below in more detail. However, the following implementationsand contexts are but a few of many.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified block diagram of a distributed environmentincorporating an exemplary embodiment.

FIG. 2 is a simplified block diagram of a computing system implementinga master bot according to certain embodiments.

FIG. 3 is a simplified block diagram of a computing system implementinga skill bot according to certain embodiments.

FIG. 4A is a simplified logical flow diagram of a data manufacturingframework for synthesizing synthetic training data based on templatesand a synchronous context-free grammar according to certain embodiments.

FIG. 4B is another simplified logical flow diagram of a datamanufacturing framework for synthesizing synthetic training data basedon templates and a synchronous context-free grammar according to certainembodiments.

FIG. 4C is a simplified logical flow diagram of a data manufacturingframework for synthesizing synthetic training data based on aprobabilistic context-free grammar and translator according to certainembodiments.

FIG. 4D is a simplified logical flow diagram of a data manufacturingframework for synthesizing synthetic training data based ontree-to-string translation according to certain embodiments.

FIG. 4E is a simplified diagram showing various aspects of a datamanufacturing framework for synthesizing synthetic training data basedon tree-to-string translation according to certain embodiments.

FIG. 4F is a simplified block diagram of a model training and deploymentsystem according to certain embodiments.

FIG. 5A illustrates an example process for synthesizing synthetictraining data based on templates and a synchronous context-free grammaraccording to certain embodiments.

FIG. 5B illustrates an example process for synthesizing synthetictraining data based on a probabilistic context-free grammar and atranslator according to certain embodiments.

FIG. 5C illustrates an example process for synthesizing synthetictraining data based on tree-to-string translation according to certainembodiments.

FIG. 5D illustrates an example process for transforming natural languageto logical form according to certain embodiments.

FIG. 6 depicts a simplified diagram of a distributed system forimplementing various embodiments.

FIG. 7 is a simplified block diagram of one or more components of asystem environment by which services provided by one or more componentsof an embodiment system may be offered as cloud services, in accordancewith various embodiments.

FIG. 8 illustrates an example computer system that may be used toimplement various embodiments.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, specificdetails are set forth in order to provide a thorough understanding ofcertain embodiments. However, it will be apparent that variousembodiments may be practiced without these specific details. The figuresand description are not intended to be restrictive. The word “exemplary”is used herein to mean “serving as an example, instance, orillustration.” Any embodiment or design described herein as “exemplary”is not necessarily to be construed as preferred or advantageous overother embodiments or designs.

Introduction

In recent years, the amount of data powering different industries, andtheir systems has been increasing exponentially. The majority ofbusiness information is managed by relational databases that store,process, and retrieve data. Databases power information systems acrossmultiple industries including retail (e.g., orders, cancellations,refunds), supply chain (e.g., raw materials, stocks, vendors),healthcare (e.g., medical records), and finance (e.g., financialbusiness metrics) to name a few. Additionally, databases power customersupport mechanisms, Internet search engines and knowledge bases, andmuch more. It is imperative for modern data-driven companies to track,in real-time, the states of their companies and their businesses inorder to quickly understand and diagnose any emerging issues, trends, oranomalies and take corrective actions. This tracking is usuallyperformed manually by business analysts interfacing with databases usingcomplex queries in declarative query languages like Structured QueryLanguage (SQL).

Although SQL queries that address fundamental business metrics arecommon, predefined, and incorporated in commercial products that powerinsights into business metrics, other non-fundamental business metricsor follow-up business metrics must be manually coded by the analysts.Such static interactions between database queries and consumption of thecorresponding results require time-consuming manual intervention andresult in slow feedback cycles. It is vastly more efficient to havenon-technical business leaders directly interact with the analyticstables via natural language queries that abstract away the underlyingSQL code. Defining a SQL query requires a strong understanding ofdatabase schema and SQL syntax and can quickly get overwhelming forbeginners and non-technical stakeholders. Efforts to bridge thiscommunication gap have led to the development of a new type ofprocessing called Natural Language Interface to Database (NLIDB). NLIDBallows users to access database information using natural languageinquiries. This natural language database search capability has becomemore popular over recent years and, as such, companies are developingdeep learning approaches for accessing specific databases using naturallanguage. One such approach is natural language to SQL (NL2SQL). NL2SQLseeks to transform natural language statements, requests, and questions(i.e., sentences) to SQL queries so that individuals, including thoseunfamiliar with SQL, can run unstructured queries against databases.Additionally, NL2SQL also enables digital assistants, such as chatbots,and other similar computational devices interacting with users toimprove their responses when an answer or response to a query can befound in different databases with different schema.

However, deep learning approaches for NL2SQL require an enormous amountof training data in order to build accurate models (i.e., models thataccurately depict the user's intent in their natural language query inthe subsequent SQL query). The conventional approaches have typicallyignored this problem and assumed the availability of large, manuallycurated training datasets (e.g., using crowd sourcing). In most cases,however, gathering and cleaning data is a substantial undertaking thatrequires a significant amount of time, effort, and money. Moreover,existing NL2SQL approaches attempt to build models that generalize tonew and unseen databases, but these approaches do not perform as well onthese databases as the databases they were trained with. That is, NL2SQLmodels built with training data to query one particular databasegenerally do not perform well (i.e., generalize) when those same modelsare used to access other databases in other domains.

Accordingly, a different approach is needed to address these challengesand others. The techniques described herein provide data manufacturingframeworks for synthesizing synthetic training data to facilitatetraining a natural language to logical form model (e.g., an NL2SQLmodel). The synthetic data can be synthesized using one or more dataaugmentation techniques described in detail throughout. Dataaugmentation is the process of increasing the amount of data assets(e.g., utterances and their corresponding SQL queries) by modifyingexisting data assets (e.g., utterances and their corresponding SQLqueries) to create new ones. In other words, data augmentation increasesthe number of examples in the training set while also introducing morevariety in what the model sees and learns from. Both these aspects makeit more difficult for the model to simply memorize mappings while alsoencouraging the model to learn general patterns (i.e., generalize).While it may be possible to collect more real-world data, this is muchmore expensive and time consuming than using the data augmentationtechniques described in detail herein.

In one particular aspect, a data manufacturing framework forsynthesizing synthetic training data based on templates and asynchronous context-free grammar is provided. In some instances, amethod includes accessing original training data, the original trainingdata including a plurality of utterances and a plurality of logicalforms, each logical form of the plurality of logical forms correspondingto at least one utterance of the plurality of utterances; generating aplurality of templates, each template of the plurality of templatesincluding a delexicalized version of an utterance in the plurality ofutterances and a delexicalized version of a logical form correspondingto the utterance; learning a grammar from the plurality of logicalforms, the grammar defining a plurality of production rules forlexicalizing the plurality of templates; generating synthetic trainingdata by parsing each template of the plurality of templates, sampling adatabase to identify a plurality of sampling components, a lexicalizingeach template of the plurality of templates with at least one samplingcomponent of the plurality of sampling components; and training amachine learning model with the original training data and the synthetictraining data to translate an utterance to a logical form.

In another particular aspect, a data manufacturing framework forsynthesizing synthetic training data based on a probabilisticcontext-free grammar and translator is provided. In some instances, amethod includes accessing original training data, the original trainingdata including a plurality of utterances and a plurality of logicalforms, each logical form of the plurality of logical forms correspondingto at least one utterance of the plurality of utterances; obtaining apre-trained model trained to translate utterances to logical forms;finetuning the pre-trained model to translate logical forms toutterances, wherein the finetuning is performed using the originaltraining data and generates a finetuned model; generating a set ofdelexicalized logical forms, each delexicalized logical form of theplurality of delexicalized logical forms being a delexicalized versionof a logical form of the plurality of logical forms; generating a set oflexicalized logical forms by lexicalizing the set of delexicalizedlogical forms; generating, by the finetuned model, synthetic trainingdata comprising an utterance for each lexicalized logical form of theset of lexicalized logical forms; and training a machine learning modelwith the original training data and the synthetic training data totranslate an utterance to a logical form.

In another particular aspect, a data manufacturing framework forsynthesizing synthetic training data based on tree-to-string translationis provided. In some instances, a method includes accessing originaltraining data, the original training data including a plurality ofutterances and a plurality of logical forms, each logical form of theplurality of logical forms corresponding to at least one utterance ofthe plurality of utterances; generating a set of abstract syntax treesfor the plurality of logical forms; generating a set of delexicalizedlogical forms, each delexicalized logical form of the plurality ofdelexicalized logical forms being a delexicalized version of a logicalform of the plurality of logical forms; generating a set of lexicalizedlogical forms by lexicalizing the set of delexicalized logical forms;generating, by a tree-to-string model, synthetic training datacomprising an utterance for each lexicalized logical form of the set oflexicalized logical forms; and training a machine learning model withthe original training data and the synthetic training data to translatean utterance to a logical form.

Advantageously, using these various methods, large-scale synthetic datacan be synthesized across various domains, which can then be sentthrough a verification process (e.g., a crowd sourcing platform) beforebeing used to train a natural language to logical form model.

Bot and Analytic Systems

A bot (also referred to as a skill, chatbot, chatterbot, or talkbot) isa computer program that can perform conversations with end users. Thebot can generally respond to natural-language messages (e.g., questionsor comments) through a messaging application that uses natural-languagemessages. Enterprises may use one or more bots to communicate with endusers through a messaging application. The messaging application mayinclude, for example, over-the-top (OTT) messaging channels (such asFacebook Messenger, Facebook WhatsApp, WeChat, Line, Kik, Telegram,Talk, Skype, Slack, or SMS), virtual private assistants (such as AmazonDot, Echo, or Show, Google Home, Apple HomePod, etc.), mobile and webapp extensions that extend native or hybrid/responsive mobile apps orweb applications with chat capabilities, or voice based input (such asdevices or apps with interfaces that use Siri, Cortana, Google Voice, orother speech input for interaction).

In some examples, the bot may be associated with a Uniform ResourceIdentifier (URI). The URI may identify the bot using a string ofcharacters. The URI may be used as a webhook for one or more messagingapplication systems. The URI may include, for example, a UniformResource Locator (URL) or a Uniform Resource Name (URN). The bot may bedesigned to receive a message (e.g., a hypertext transfer protocol(HTTP) post call message) from a messaging application system. The HTTPpost call message may be directed to the URI from the messagingapplication system. In some examples, the message may be different froma HTTP post call message. For example, the bot may receive a messagefrom a Short Message Service (SMS). While discussion herein refers tocommunications that the bot receives as a message, it should beunderstood that the message may be an HTTP post call message, a SMSmessage, or any other type of communication between two systems.

End users interact with the bot through conversational interactions(sometimes referred to as a conversational user interface (UI)), just asend users interact with other people. In some cases, the conversationalinteractions may include the end user saying “Hello” to the bot and thebot responding with a “Hi” and asking the end user how it can help. Endusers also interact with the bot through other types of interactions,such as transactional interactions (e.g., with a banking bot that is atleast trained to transfer money from one account to another),informational interactions (e.g., with a human resources bot that is atleast trained check the remaining vacation hours the user has), and/orretail interactions (e.g., with a retail bot that is at least trainedfor discussing returning purchased goods or seeking technical support).

In some examples, the bot may intelligently handle end user interactionswithout intervention by an administrator or developer of the bot. Forexample, an end user may send one or more messages to the bot in orderto achieve a desired goal. A message may include certain content, suchas text, emojis, audio, image, video, or other method of conveying amessage. In some examples, the bot may automatically convert contentinto a standardized form and generate a natural language response. Thebot may also automatically prompt the end user for additional inputparameters or request other additional information. In some examples,the bot may also initiate communication with the end user, rather thanpassively responding to end user utterances.

A conversation with a bot may follow a specific conversation flowincluding multiple states. The flow may define what would happen nextbased on an input. In some examples, a state machine that includes userdefined states (e.g., end user intents) and actions to take in thestates or from state to state may be used to implement the bot. Aconversation may take different paths based on the end user input, whichmay impact the decision the bot makes for the flow. For example, at eachstate, based on the end user input or utterances, the bot may determinethe end user's intent in order to determine the appropriate next actionto take. As used herein and in the context of an utterance, the term“intent” refers to an intent of the user who provided the utterance. Forexample, the user may intend to engage the bot in a conversation toorder pizza, where the user's intent would be represented through theutterance “order pizza.” A user intent can be directed to a particulartask that the user wishes the bot to perform on behalf of the user.Therefore, utterances reflecting the user's intent can be phrased asquestions, commands, requests, and the like.

In the context of the configuration of the bot, the term “intent” isalso used herein to refer to configuration information for mapping auser's utterance to a specific task/action or category of task/actionthat the bot can perform. In order to distinguish between the intent ofan utterance (i.e., a user intent) and the intent of the bot, the latteris sometimes referred to herein as a “bot intent.” A bot intent maycomprise a set of one or more utterances associated with the intent. Forinstance, an intent for ordering pizza can have various permutations ofutterances that express a desire to place an order for pizza. Theseassociated utterances can be used to train an intent classifier of thebot to enable the intent classifier to subsequently determine whether aninput utterance from a user matches the order pizza intent. Bot intentsmay be associated with one or more dialog flows for starting aconversation with the user and in a certain state. For example, thefirst message for the order pizza intent could be the question “Whatkind of pizza would you like?” In addition to associated utterances, botintents may further comprise named entities that relate to the intent.For example, the order pizza intent could include variables orparameters used to perform the task of ordering pizza (e.g., topping 1,topping 2, pizza type, pizza size, pizza quantity, and the like). Thevalue of an entity is typically obtained through conversing with theuser.

FIG. 1 is a simplified block diagram of an environment 100 incorporatinga chatbot system according to certain embodiments. Environment 100comprises a digital assistant builder platform (DABP) 102 that enablesusers 104 of DABP 102 to create and deploy digital assistants or chatbotsystems. DABP 102 can be used to create one or more digital assistants(or DAs) or chatbot systems. For example, as shown in FIG. 1 , users 104representing a particular enterprise can use DABP 102 to create anddeploy a digital assistant 106 for users of the particular enterprise.For example, DABP 102 can be used by a bank to create one or moredigital assistants for use by the bank's customers. The same DABP 102platform can be used by multiple enterprises to create digitalassistants. As another example, an owner of a restaurant (e.g., a pizzashop) may use DABP 102 to create and deploy a digital assistant thatenables customers of the restaurant to order food (e.g., order pizza).

For purposes of this disclosure, a “digital assistant” is a tool thathelps users of the digital assistant accomplish various tasks throughnatural language conversations. A digital assistant can be implementedusing software only (e.g., the digital assistant is a digital toolimplemented using programs, code, or instructions executable by one ormore processors), using hardware, or using a combination of hardware andsoftware. A digital assistant can be embodied or implemented in variousphysical systems or devices, such as in a computer, a mobile phone, awatch, an appliance, a vehicle, and the like. A digital assistant isalso sometimes referred to as a chatbot system. Accordingly, forpurposes of this disclosure, the terms digital assistant and chatbotsystem are interchangeable.

A digital assistant, such as digital assistant 106 built using DABP 102,can be used to perform various tasks via natural language-basedconversations between the digital assistant and its users 108. As partof a conversation, a user may provide one or more user inputs 110 todigital assistant 106 and get responses 112 back from digital assistant106. A conversation can include one or more of inputs 110 and responses112. Via these conversations, a user can request one or more tasks to beperformed by the digital assistant and, in response, the digitalassistant is configured to perform the user-requested tasks and respondwith appropriate responses to the user.

User inputs 110 are generally in a natural language form and arereferred to as utterances. A user utterance 110 can be in text form,such as when a user types in a sentence, a question, a text fragment, oreven a single word and provides it as input to digital assistant 106. Insome examples, a user utterance 110 can be in audio input or speechform, such as when a user says or speaks something that is provided asinput to digital assistant 106. The utterances are typically in alanguage spoken by the user. For example, the utterances may be inEnglish, or some other language. When an utterance is in speech form,the speech input is converted to text form utterances in that particularlanguage and the text utterances are then processed by digital assistant106. Various speech-to-text processing techniques may be used to converta speech or audio input to a text utterance, which is then processed bydigital assistant 106. In some examples, the speech-to-text conversionmay be done by digital assistant 106 itself.

An utterance, which may be a text utterance or a speech utterance, canbe a fragment, a sentence, multiple sentences, one or more words, one ormore questions, combinations of the aforementioned types, and the like.Digital assistant 106 is configured to apply natural languageunderstanding (NLU) techniques to the utterance to understand themeaning of the user input. As part of the NLU processing for anutterance, digital assistant 106 is configured to perform processing tounderstand the meaning of the utterance, which involves identifying oneor more intents and one or more entities corresponding to the utterance.Upon understanding the meaning of an utterance, digital assistant 106may perform one or more actions or operations responsive to theunderstood meaning or intents. For purposes of this disclosure, it isassumed that the utterances are text utterances that have been provideddirectly by a user of digital assistant 106 or are the results ofconversion of input speech utterances to text form. This however is notintended to be limiting or restrictive in any manner.

For example, a user input may request a pizza to be ordered by providingan utterance such as “I want to order a pizza.” Upon receiving such anutterance, digital assistant 106 is configured to understand the meaningof the utterance and take appropriate actions. The appropriate actionsmay involve, for example, responding to the user with questionsrequesting user input on the type of pizza the user desires to order,the size of the pizza, any toppings for the pizza, and the like. Theresponses provided by digital assistant 106 may also be in naturallanguage form and typically in the same language as the input utterance.As part of generating these responses, digital assistant 106 may performnatural language generation (NLG). For the user ordering a pizza, viathe conversation between the user and digital assistant 106, the digitalassistant may guide the user to provide all the requisite informationfor the pizza order, and then at the end of the conversation cause thepizza to be ordered. Digital assistant 106 may end the conversation byoutputting information to the user indicating that the pizza has beenordered.

At a conceptual level, digital assistant 106 performs various processingin response to an utterance received from a user. In some examples, thisprocessing involves a series or pipeline of processing steps including,for example, understanding the meaning of the input utterance,determining an action to be performed in response to the utterance,where appropriate causing the action to be performed, generating aresponse to be output to the user responsive to the user utterance,outputting the response to the user, and the like. The NLU processingcan include parsing the received input utterance to understand thestructure and meaning of the utterance, refining, and reforming theutterance to develop a better understandable form (e.g., logical form)or structure for the utterance. Generating a response may include usingNLG techniques.

The NLU processing performed by a digital assistant, such as digitalassistant 106, can include various NLP related tasks such as sentenceparsing (e.g., tokenizing, lemmatizing, identifying part-of-speech tagsfor the sentence, identifying named entities in the sentence, generatingdependency trees to represent the sentence structure, splitting asentence into clauses, analyzing individual clauses, resolvinganaphoras, performing chunking, and the like). In certain examples, theNLU processing is performed by digital assistant 106 itself. In someother examples, digital assistant 106 may use other resources to performportions of the NLU processing. For example, the syntax and structure ofan input utterance sentence may be identified by processing the sentenceusing a parser, a part-of-speech tagger, and/or a NER. In oneimplementation, for the English language, a parser, a part-of-speechtagger, and a named entity recognizer such as ones provided by theStanford NLP Group are used for analyzing the sentence structure andsyntax. These are provided as part of the Stanford CoreNLP toolkit.

While the various examples provided in this disclosure show utterancesin the English language, this is meant only as an example. In certainexamples, digital assistant 106 is also capable of handling utterancesin languages other than English. Digital assistant 106 may providesubsystems (e.g., components implementing NLU functionality) that areconfigured for performing processing for different languages. Thesesubsystems may be implemented as pluggable units that can be calledusing service calls from an NLU core server. This makes the NLUprocessing flexible and extensible for each language, including allowingdifferent orders of processing. A language pack may be provided forindividual languages, where a language pack can register a list ofsubsystems that can be served from the NLU core server.

A digital assistant, such as digital assistant 106 depicted in FIG. 1 ,can be made available or accessible to its users 108 through a varietyof different channels, such as but not limited to, via certainapplications, via social media platforms, via various messaging servicesand applications, and other applications or channels. A single digitalassistant can have several channels configured for it so that it can berun on and be accessed by different services simultaneously.

A digital assistant or chatbot system generally contains or isassociated with one or more skills. In certain embodiments, these skillsare individual chatbots (referred to as skill bots) that are configuredto interact with users and fulfill specific types of tasks, such astracking inventory, submitting timecards, creating expense reports,ordering food, checking a bank account, making reservations, buying awidget, and the like. For example, for the embodiment depicted in FIG. 1, digital assistant or chatbot system 106 includes skills 116-1, 116-2,116-3, and so on. For purposes of this disclosure, the terms “skill” and“skills” are used synonymously with the terms “skill bot” and “skillbots,” respectively.

Each skill associated with a digital assistant helps a user of thedigital assistant complete a task through a conversation with the user,where the conversation can include a combination of text or audio inputsprovided by the user and responses provided by the skill bots. Theseresponses may be in the form of text or audio messages to the userand/or using simple user interface elements (e.g., select lists) thatare presented to the user for the user to make selections.

There are various ways in which a skill or skill bot can be associatedor added to a digital assistant. In some instances, a skill bot can bedeveloped by an enterprise and then added to a digital assistant usingDABP 102. In other instances, a skill bot can be developed and createdusing DABP 102 and then added to a digital assistant created using DABP102. In yet other instances, DABP 102 provides an online digital store(referred to as a “skills store”) that offers multiple skills directedto a wide range of tasks. The skills offered through the skills storemay also expose various cloud services. In order to add a skill to adigital assistant being generated using DABP 102, a user of DABP 102 canaccess the skills store via DABP 102, select a desired skill, andindicate that the selected skill is to be added to the digital assistantcreated using DABP 102. A skill from the skills store can be added to adigital assistant as is or in a modified form (for example, a user ofDABP 102 may select and clone a particular skill bot provided by theskills store, make customizations or modifications to the selected skillbot, and then add the modified skill bot to a digital assistant createdusing DABP 102).

Various different architectures may be used to implement a digitalassistant or chatbot system. For example, in certain embodiments, thedigital assistants created and deployed using DABP 102 may beimplemented using a master bot/child (or sub) bot paradigm orarchitecture. According to this paradigm, a digital assistant isimplemented as a master bot that interacts with one or more child botsthat are skill bots. For example, in the embodiment depicted in FIG. 1 ,digital assistant 106 comprises a master bot 114 and skill bots 116-1,116-2, etc. that are child bots of master bot 114. In certain examples,digital assistant 106 is itself considered to act as the master bot.

A digital assistant implemented according to the master-child botarchitecture enables users of the digital assistant to interact withmultiple skills through a unified user interface, namely via the masterbot. When a user engages with a digital assistant, the user input isreceived by the master bot. The master bot then performs processing todetermine the meaning of the user input utterance. The master bot thendetermines whether the task requested by the user in the utterance canbe handled by the master bot itself, else the master bot selects anappropriate skill bot for handling the user request and routes theconversation to the selected skill bot. This enables a user to conversewith the digital assistant through a common single interface and stillprovide the capability to use several skill bots configured to performspecific tasks. For example, for a digital assistance developed for anenterprise, the master bot of the digital assistant may interface withskill bots with specific functionalities, such as a customerrelationship management (CRM) bot for performing functions related tocustomer relationship management , an enterprise resource planning (ERP)bot for performing functions related to enterprise resource planning, ahuman capital management (HCM) bot for performing functions related tohuman capital management, etc. This way the end user or consumer of thedigital assistant need only know how to access the digital assistantthrough the common master bot interface and behind the scenes multipleskill bots are provided for handling the user request.

In certain examples, in a master bot/child bots infrastructure, themaster bot is configured to be aware of the available list of skillbots. The master bot may have access to metadata that identifies thevarious available skill bots, and for each skill bot, the capabilitiesof the skill bot including the tasks that can be performed by the skillbot. Upon receiving a user request in the form of an utterance, themaster bot is configured to, from the multiple available skill bots,identify or predict a specific skill bot that can best serve or handlethe user request. The master bot then routes the utterance (or a portionof the utterance) to that specific skill bot for further handling.Control thus flows from the master bot to the skill bots. The master botcan support multiple input and output channels. In certain examples,routing may be performed with the aid of processing performed by one ormore available skill bots. For example, as discussed below, a skill botcan be trained to infer an intent for an utterance and to determinewhether the inferred intent matches an intent with which the skill botis configured. Thus, the routing performed by the master bot can involvethe skill bot communicating to the master bot an indication of whetherthe skill bot has been configured with an intent suitable for handlingthe utterance.

While the embodiment in FIG. 1 shows digital assistant 106 comprising amaster bot 114 and skill bots 116-1, 116-2, and 116-3, this is notintended to be limiting. A digital assistant can include various othercomponents (e.g., other systems and subsystems) that provide thefunctionalities of the digital assistant. These systems and subsystemsmay be implemented only in software (e.g., code, instructions stored ona computer-readable medium and executable by one or more processors), inhardware only, or in implementations that use a combination of softwareand hardware.

DABP 102 provides an infrastructure and various services and featuresthat enable a user of DABP 102 to create a digital assistant includingone or more skill bots associated with the digital assistant. In someinstances, a skill bot can be created by cloning an existing skill bot,for example, cloning a skill bot provided by the skills store. Aspreviously indicated, DABP 102 provides a skills store or skills catalogthat offers multiple skill bots for performing various tasks. A user ofDABP 102 can clone a skill bot from the skills store. As needed,modifications or customizations may be made to the cloned skill bot. Insome other instances, a user of DABP 102 created a skill bot fromscratch using tools and services offered by DABP 102. As previouslyindicated, the skills store or skills catalog provided by DABP 102 mayoffer multiple skill bots for performing various tasks.

In certain examples, at a high level, creating or customizing a skillbot involves the following steps:

-   -   (1) Configuring settings for a new skill bot    -   (2) Configuring one or more intents for the skill bot    -   (3) Configuring one or more entities for one or more intents    -   (4) Training the skill bot    -   (5) Creating a dialog flow for the skill bot    -   (6) Adding custom components to the skill bot as needed    -   (7) Testing and deploying the skill bot        Each of the above steps is briefly described below.

(1) Configuring settings for a new skill bot—Various settings may beconfigured for the skill bot. For example, a skill bot designer canspecify one or more invocation names for the skill bot being created.These invocation names can then be used by users of a digital assistantto explicitly invoke the skill bot. For example, a user can input aninvocation name in the user's utterance to explicitly invoke thecorresponding skill bot.

(2) Configuring one or more intents and associated example utterancesfor the skill bot—The skill bot designer specifies one or more intents(also referred to as bot intents) for a skill bot being created. Theskill bot is then trained based upon these specified intents. Theseintents represent categories or classes that the skill bot is trained toinfer for input utterances. Upon receiving an utterance, a trained skillbot infers an intent for the utterance, where the inferred intent isselected from the predefined set of intents used to train the skill bot.The skill bot then takes an appropriate action responsive to anutterance based upon the intent inferred for that utterance. In someinstances, the intents for a skill bot represent tasks that the skillbot can perform for users of the digital assistant. Each intent is givenan intent identifier or intent name. For example, for a skill bottrained for a bank, the intents specified for the skill bot may include“CheckBalance,” “TransferMoney,” “DepositCheck,” and the like.

For each intent defined for a skill bot, the skill bot designer may alsoprovide one or more example utterances that are representative of andillustrate the intent. These example utterances are meant to representutterances that a user may input to the skill bot for that intent. Forexample, for the CheckBalance intent, example utterances may include“What's my savings account balance?”, “How much is in my checkingaccount?”, “How much money do I have in my account,” and the like.Accordingly, various permutations of typical user utterances may bespecified as example utterances for an intent.

The intents and their associated example utterances are used as trainingdata to train the skill bot. Various different training techniques maybe used. As a result of this training, a predictive model is generatedthat is configured to take an utterance as input and output an intentinferred for the utterance by the predictive model. In some instances,input utterances are provided to an intent analysis engine, which isconfigured to use the trained model to predict or infer an intent forthe input utterance. The skill bot may then take one or more actionsbased upon the inferred intent.

(3) Configuring entities for one or more intents of the skill bot—Insome instances, additional context may be needed to enable the skill botto properly respond to a user utterance. For example, there may besituations where a user input utterance resolves to the same intent in askill bot. For instance, in the above example, utterances “What's mysavings account balance?” and “How much is in my checking account?” bothresolve to the same CheckBalance intent, but these utterances aredifferent requests asking for different things. To clarify suchrequests, one or more entities are added to an intent. Using the bankingskill bot example, an entity called AccountType, which defines valuescalled “checking” and “saving” may enable the skill bot to parse theuser request and respond appropriately. In the above example, while theutterances resolve to the same intent, the value associated with theAccountType entity is different for the two utterances. This enables theskill bot to perform possibly different actions for the two utterancesin spite of them resolving to the same intent. One or more entities canbe specified for certain intents configured for the skill bot. Entitiesare thus used to add context to the intent itself. Entities helpdescribe an intent more fully and enable the skill bot to complete auser request.

In certain examples, there are two types of entities: (a) built-inentities provided by DABP 102, and (2) custom entities that can bespecified by a skill bot designer. Built-in entities are genericentities that can be used with a wide variety of bots. Examples ofbuilt-in entities include, without limitation, entities related to time,date, addresses, numbers, email addresses, duration, recurring timeperiods, currencies, phone numbers, URLs, and the like. Custom entitiesare used for more customized applications. For example, for a bankingskill, an AccountType entity may be defined by the skill bot designerthat enables various banking transactions by checking the user input forkeywords like checking, savings, and credit cards, etc.

(4) Training the skill bot—A skill bot is configured to receive userinput in the form of utterances parse or otherwise process the receivedinput and identify or select an intent that is relevant to the receiveduser input. As indicated above, the skill bot has to be trained forthis. In certain embodiments, a skill bot is trained based upon theintents configured for the skill bot and the example utterancesassociated with the intents (collectively, the training data), so thatthe skill bot can resolve user input utterances to one of its configuredintents. In certain examples, the skill bot uses a predictive model thatis trained using the training data and allows the skill bot to discernwhat users say (or in some cases, are trying to say). DABP 102 providesvarious different training techniques that can be used by a skill botdesigner to train a skill bot, including various machine-learning basedtraining techniques, rules-based training techniques, and/orcombinations thereof. In certain examples, a portion (e.g., 80%) of thetraining data is used to train a skill bot model and another portion(e.g., the remaining 20%) is used to test or verify the model. Oncetrained, the trained model (also sometimes referred to as the trainedskill bot) can then be used to handle and respond to user utterances. Incertain cases, a user's utterance may be a question that requires only asingle answer and no further conversation. In order to handle suchsituations, a Q&A (question-and-answer) intent may be defined for askill bot. This enables a skill bot to output replies to user requestswithout having to update the dialog definition. Q&A intents are createdin a similar manner as regular intents. The dialog flow for Q&A intentscan be different from that for regular intents.

(5) Creating a dialog flow for the skill bot—A dialog flow specified fora skill bot describes how the skill bot reacts as different intents forthe skill bot are resolved responsive to received user input. The dialogflow defines operations or actions that a skill bot will take, e.g., howthe skill bot responds to user utterances, how the skill bot promptsusers for input, how the skill bot returns data. A dialog flow is like aflowchart that is followed by the skill bot. The skill bot designerspecifies a dialog flow using a language, such as markdown language. Incertain embodiments, a version of YAML called OBotML may be used tospecify a dialog flow for a skill bot. The dialog flow definition for askill bot acts as a model for the conversation itself, one that lets theskill bot designer choreograph the interactions between a skill bot andthe users that the skill bot services.

In certain examples, the dialog flow definition for a skill bot containsthree sections:

(a) a context section

(b) a default transitions section

(c) a states section

Context section—The skill bot designer can define variables that areused in a conversation flow in the context section. Other variables thatmay be named in the context section include, without limitation:variables for error handling, variables for built-in or custom entities,user variables that enable the skill bot to recognize and persist userpreferences, and the like.

Default transitions section—Transitions for a skill bot can be definedin the dialog flow states section or in the default transitions section.The transitions defined in the default transition section act as afallback and get triggered when there are no applicable transitionsdefined within a state, or the conditions required to trigger a statetransition cannot be met. The default transitions section can be used todefine routing that allows the skill bot to gracefully handle unexpecteduser actions.

States section—A dialog flow and its related operations are defined as asequence of transitory states, which manage the logic within the dialogflow. Each state node within a dialog flow definition names a componentthat provides the functionality needed at that point in the dialog.States are thus built around the components. A state containscomponent-specific properties and defines the transitions to otherstates that get triggered after the component executes.

Special case scenarios may be handled using the states sections. Forexample, there might be times when you want to provide users the optionto temporarily leave a first skill, they are engaged with to dosomething in a second skill within the digital assistant. For example,if a user is engaged in a conversation with a shopping skill (e.g., theuser has made some selections for purchase), the user may want to jumpto a banking skill (e.g., the user may want to ensure that he/she hasenough money for the purchase), and then return to the shopping skill tocomplete the user's order. To address this, an action in the first skillcan be configured to initiate an interaction with the second differentskill in the same digital assistant and then return to the originalflow.

(6) Adding custom components to the skill bot—As described above, statesspecified in a dialog flow for skill bot name components that providethe functionality needed corresponding to the states. Components enablea skill bot to perform functions. In certain embodiments, DABP 102provides a set of preconfigured components for performing a wide rangeof functions. A skill bot designer can select one of more of thesepreconfigured components and associate them with states in the dialogflow for a skill bot. The skill bot designer can also create custom ornew components using tools provided by DABP 102 and associate the customcomponents with one or more states in the dialog flow for a skill bot.

(7) Testing and deploying the skill bot—DABP 102 provides severalfeatures that enable the skill bot designer to test a skill bot beingdeveloped. The skill bot can then be deployed and included in a digitalassistant.

While the description above describes how to create a skill bot, similartechniques may also be used to create a digital assistant (or the masterbot). At the master bot or digital assistant level, built-in systemintents may be configured for the digital assistant. These built-insystem intents are used to identify general tasks that the digitalassistant itself (i.e., the master bot) can handle without invoking askill bot associated with the digital assistant. Examples of systemintents defined for a master bot include: (1) Exit: applies when theuser signals the desire to exit the current conversation or context inthe digital assistant; (2) Help: applies when the user asks for help ororientation; and (3) Unresolved Intent: applies to user input thatdoesn't match well with the exit and help intents. The digital assistantalso stores information about the one or more skill bots associated withthe digital assistant. This information enables the master bot to selecta particular skill bot for handling an utterance.

At the master bot or digital assistant level, when a user inputs aphrase or utterance to the digital assistant, the digital assistant isconfigured to perform processing to determine how to route the utteranceand the related conversation. The digital assistant determines thisusing a routing model, which can be rules-based, AI-based, or acombination thereof. The digital assistant uses the routing model todetermine whether the conversation corresponding to the user inpututterance is to be routed to a particular skill for handling, is to behandled by the digital assistant or master bot itself per a built-insystem intent or is to be handled as a different state in a currentconversation flow.

In certain embodiments, as part of this processing, the digitalassistant determines if the user input utterance explicitly identifies askill bot using its invocation name. If an invocation name is present inthe user input, then it is treated as explicit invocation of the skillbot corresponding to the invocation name. In such a scenario, thedigital assistant may route the user input to the explicitly invokedskill bot for further handling. If there is no specific or explicitinvocation, in certain embodiments, the digital assistant evaluates thereceived user input utterance and computes confidence scores for thesystem intents and the skill bots associated with the digital assistant.The score computed for a skill bot or system intent represents howlikely the user input is representative of a task that the skill bot isconfigured to perform or is representative of a system intent. Anysystem intent or skill bot with an associated computed confidence scoreexceeding a threshold value (e.g., a Confidence Threshold routingparameter) is selected as a candidate for further evaluation. Thedigital assistant then selects, from the identified candidates, aparticular system intent or a skill bot for further handling of the userinput utterance. In certain embodiments, after one or more skill botsare identified as candidates, the intents associated with thosecandidate skills are evaluated (according to the intent model for eachskill) and confidence scores are determined for each intent. In general,any intent that has a confidence score exceeding a threshold value(e.g., 70%) is treated as a candidate intent. If a particular skill botis selected, then the user utterance is routed to that skill bot forfurther processing. If a system intent is selected, then one or moreactions are performed by the master bot itself according to the selectedsystem intent.

FIG. 2 is a simplified block diagram of a master bot (MB) system 200according to certain embodiments. MB system 200 can be implemented insoftware only, hardware only, or a combination of hardware and software.MB system 200 includes a pre-processing subsystem 210, a multiple intentsubsystem (MIS) 220, an explicit invocation subsystem (EIS) 230, a skillbot invoker 240, and a data store 250. MB system 200 depicted in FIG. 2is merely an example of an arrangement of components in a master bot.One of ordinary skill in the art would recognize many possiblevariations, alternatives, and modifications. For example, in someimplementations, MB system 200 may have more or fewer systems orcomponents than those shown in FIG. 2 , may combine two or moresubsystems, or may have a different configuration or arrangement ofsubsystems.

Pre-processing subsystem 210 receives an utterance “A” 202 from a userand processes the utterance through a language detector 212 and alanguage parser 214. As indicated above, an utterance can be provided invarious ways including audio or text. The utterance 202 can be asentence fragment, a complete sentence, multiple sentences, and thelike. Utterance 202 can include punctuation. For example, if theutterance 202 is provided as audio, the pre-processing subsystem 210 mayconvert the audio to text using a speech-to-text converter (not shown)that inserts punctuation marks into the resulting text, e.g., commas,semicolons, periods, etc.

Language detector 212 detects the language of the utterance 202 based onthe text of the utterance 202. The manner in which the utterance 202 ishandled depends on the language since each language has its own grammarand semantics. Differences between languages are taken intoconsideration when analyzing the syntax and structure of an utterance.

Language parser 214 parses the utterance 202 to extract part of speech(POS) tags for individual linguistic units (e.g., words) in theutterance 202. POS tags include, for example, noun (NN), pronoun (PN),verb (VB), and the like. Language parser 214 may also tokenize thelinguistic units of the utterance 202 (e.g., to convert each word into aseparate token) and lemmatize words. A lemma is the main form of a setof words as represented in a dictionary (e.g., “run” is the lemma forrun, runs, ran, running, etc.). Other types of pre-processing that thelanguage parser 214 can perform include chunking of compoundexpressions, e.g., combining “credit” and “card” into a singleexpression “credit card.” Language parser 214 may also identifyrelationships between the words in the utterance 202. For example, insome embodiments, the language parser 214 generates a dependency treethat indicates which part of the utterance (e.g., a particular noun) isa direct object, which part of the utterance is a preposition, and soon. The results of the processing performed by the language parser 214form extracted information 205 and are provided as input to MIS 220together with the utterance 202 itself.

As indicated above, the utterance 202 can include more than onesentence. For purposes of detecting multiple intents and explicitinvocation, the utterance 202 can be treated as a single unit even if itincludes multiple sentences. However, in certain embodiments,pre-processing can be performed, e.g., by the pre-processing subsystem210, to identify a single sentence among multiple sentences for multipleintents analysis and explicit invocation analysis. In general, theresults produced by MIS 220 and EIS 230 are substantially the sameregardless of whether the utterance 202 is processed at the level of anindividual sentence or as a single unit comprising multiple sentences.

MIS 220 determines whether the utterance 202 represents multipleintents. Although MIS 220 can detect the presence of multiple intents inthe utterance 202, the processing performed by MIS 220 does not involvedetermining whether the intents of the utterance 202 match to anyintents that have been configured for a bot. Instead, processing todetermine whether an intent of the utterance 202 matches a bot intentcan be performed by an intent classifier 242 of the MB system 200 or byan intent classifier of a skill bot (e.g., as shown in FIG. 3 ). Theprocessing performed by MIS 220 assumes that there exists a bot (e.g., aparticular skill bot or the master bot itself) that can handle theutterance 202. Therefore, the processing performed by MIS 220 does notrequire knowledge of what bots are in the chatbot system (e.g., theidentities of skill bots registered with the master bot) or knowledge ofwhat intents have been configured for a particular bot.

To determine that the utterance 202 includes multiple intents, the MIS220 applies one or more rules from a set of rules 252 in the data store250. The rules applied to the utterance 202 depend on the language ofthe utterance 202 and may include sentence patterns that indicate thepresence of multiple intents. For example, a sentence pattern mayinclude a coordinating conjunction that joins two parts (e.g.,conjuncts) of a sentence, where both parts correspond to a separateintent. If the utterance 202 matches the sentence pattern, it can beinferred that the utterance 202 represents multiple intents. It shouldbe noted that an utterance with multiple intents does not necessarilyhave different intents (e.g., intents directed to different bots or todifferent intents within the same bot). Instead, the utterance couldhave separate instances of the same intent (e.g. “Place a pizza orderusing payment account X, then place a pizza order using payment accountY”).

As part of determining that the utterance 202 represents multipleintents, the MIS 220 also determines what portions of the utterance 202are associated with each intent. MIS 220 constructs, for each intentrepresented in an utterance containing multiple intents, a new utterancefor separate processing in place of the original utterance, e.g., anutterance “B” 206 and an utterance “C” 208, as depicted in FIG. 2 .Thus, the original utterance 202 can be split into two or more separateutterances that are handled one at a time. MIS 220 determines, using theextracted information 205 and/or from analysis of the utterance 202itself, which of the two or more utterances should be handled first. Forexample, MIS 220 may determine that the utterance 202 contains a markerword indicating that a particular intent should be handled first. Thenewly formed utterance corresponding to this particular intent (e.g.,one of utterance 206 or utterance 208) will be the first to be sent forfurther processing by EIS 230. After a conversation triggered by thefirst utterance has ended (or has been temporarily suspended), the nexthighest priority utterance (e.g., the other one of utterance 206 orutterance 208) can then be sent to the EIS 230 for processing.

EIS 230 determines whether the utterance that it receives (e.g.,utterance 206 or utterance 208) contains an invocation name of a skillbot. In certain embodiments, each skill bot in a chatbot system isassigned a unique invocation name that distinguishes the skill bot fromother skill bots in the chatbot system. A list of invocation names canbe maintained as part of skill bot information 254 in data store 250. Anutterance is deemed to be an explicit invocation when the utterancecontains a word match to an invocation name. If a bot is not explicitlyinvoked, then the utterance received by the EIS 230 is deemed anon-explicitly invoking utterance 234 and is input to an intentclassifier (e.g., intent classifier 242) of the master bot to determinewhich bot to use for handling the utterance. In some instances, theintent classifier 242 will determine that the master bot should handle anon-explicitly invoking utterance. In other instances, the intentclassifier 242 will determine a skill bot to route the utterance to forhandling.

The explicit invocation functionality provided by the EIS 230 hasseveral advantages. It can reduce the amount of processing that themaster bot has to perform. For example, when there is an explicitinvocation, the master bot may not have to do any intent classificationanalysis (e.g., using the intent classifier 242), or may have to doreduced intent classification analysis for selecting a skill bot. Thus,explicit invocation analysis may enable selection of a particular skillbot without resorting to intent classification analysis.

Also, there may be situations where there is an overlap infunctionalities between multiple skill bots. This may happen, forexample, if the intents handled by the two skill bots overlap or arevery close to each other. In such a situation, it may be difficult forthe master bot to identify which of the multiple skill bots to selectbased upon intent classification analysis alone. In such scenarios, theexplicit invocation disambiguates the particular skill bot to be used.

In addition to determining that an utterance is an explicit invocation,the EIS 230 is responsible for determining whether any portion of theutterance should be used as input to the skill bot being explicitlyinvoked. In particular, EIS 230 can determine whether part of theutterance is not associated with the invocation. The EIS 230 can performthis determination through analysis of the utterance and/or analysis ofthe extracted information 205. EIS 230 can send the part of theutterance not associated with the invocation to the invoked skill bot inlieu of sending the entire utterance that was received by the EIS 230.In some instances, the input to the invoked skill bot is formed simplyby removing any portion of the utterance associated with the invocation.For example, “I want to order pizza using Pizza Bot” can be shortened to“I want to order pizza” since “using Pizza Bot” is relevant to theinvocation of the pizza bot, but irrelevant to any processing to beperformed by the pizza bot. In some instances, EIS 230 may reformat thepart to be sent to the invoked bot, e.g., to form a complete sentence.Thus, the EIS 230 determines not only that there is an explicitinvocation, but also what to send to the skill bot when there is anexplicit invocation. In some instances, there may not be any text toinput to the bot being invoked. For example, if the utterance was “PizzaBot”, then the EIS 230 could determine that the pizza bot is beinginvoked, but there is no text to be processed by the pizza bot. In suchscenarios, the EIS 230 may indicate to the skill bot invoker 240 thatthere is nothing to send.

Skill bot invoker 240 invokes a skill bot in various ways. For instance,skill bot invoker 240 can invoke a bot in response to receiving anindication 235 that a particular skill bot has been selected as a resultof an explicit invocation. The indication 235 can be sent by the EIS 230together with the input for the explicitly invoked skill bot. In thisscenario, the skill bot invoker 240 will turn control of theconversation over to the explicitly invoked skill bot. The explicitlyinvoked skill bot will determine an appropriate response to the inputfrom the EIS 230 by treating the input as a stand-alone utterance. Forexample, the response could be to perform a specific action or to starta new conversation in a particular state, where the initial state of thenew conversation depends on the input sent from the EIS 230.

Another way in which skill bot invoker 240 can invoke a skill bot isthrough implicit invocation using the intent classifier 242. The intentclassifier 242 can be trained, using machine-learning and/or rules-basedtraining techniques, to determine a likelihood that an utterance isrepresentative of a task that a particular skill bot is configured toperform. The intent classifier 242 is trained on different classes, oneclass for each skill bot. For instance, whenever a new skill bot isregistered with the master bot, a list of example utterances associatedwith the new skill bot can be used to train the intent classifier 242 todetermine a likelihood that a particular utterance is representative ofa task that the new skill bot can perform. The parameters produced asresult of this training (e.g., a set of values for parameters of amachine-learning model) can be stored as part of skill bot information254.

In certain embodiments, the intent classifier 242 is implemented using amachine-learning model, as described in further detail herein. Trainingof the machine-learning model may involve inputting at least a subset ofutterances from the example utterances associated with various skillbots to generate, as an output of the machine-learning model, inferencesas to which bot is the correct bot for handling any particular trainingutterance. For each training utterance, an indication of the correct botto use for the training utterance may be provided as ground truthinformation. The behavior of the machine-learning model can then beadapted (e.g., through back-propagation) to minimize the differencebetween the generated inferences and the ground truth information.

In certain embodiments, the intent classifier 242 determines, for eachskill bot registered with the master bot, a confidence score indicatinga likelihood that the skill bot can handle an utterance (e.g., thenon-explicitly invoking utterance 234 received from EIS 230). The intentclassifier 242 may also determine a confidence score for each systemlevel intent (e.g., help, exit) that has been configured. If aparticular confidence score meets one or more conditions, then the skillbot invoker 240 will invoke the bot associated with the particularconfidence score. For example, a threshold confidence score value mayneed to be met. Thus, an output 245 of the intent classifier 242 iseither an identification of a system intent or an identification of aparticular skill bot. In some embodiments, in addition to meeting athreshold confidence score value, the confidence score must exceed thenext highest confidence score by a certain win margin. Imposing such acondition would enable routing to a particular skill bot when theconfidence scores of multiple skill bots each exceed the thresholdconfidence score value.

After identifying a bot based on evaluation of confidence scores, theskill bot invoker 240 hands over processing to the identified bot. Inthe case of a system intent, the identified bot is the master bot.Otherwise, the identified bot is a skill bot. Further, the skill botinvoker 240 will determine what to provide as input 247 for theidentified bot. As indicated above, in the case of an explicitinvocation, the input 247 can be based on a part of an utterance that isnot associated with the invocation, or the input 247 can be nothing(e.g., an empty string). In the case of an implicit invocation, theinput 247 can be the entire utterance.

Data store 250 comprises one or more computing devices that store dataused by the various subsystems of the master bot system 200. Asexplained above, the data store 250 includes rules 252 and skill botinformation 254. The rules 252 include, for example, rules fordetermining, by MIS 220, when an utterance represents multiple intentsand how to split an utterance that represents multiple intents. Therules 252 further include rules for determining, by EIS 230, which partsof an utterance that explicitly invokes a skill bot to send to the skillbot. The skill bot information 254 includes invocation names of skillbots in the chatbot system, e.g., a list of the invocation names of allskill bots registered with a particular master bot. The skill botinformation 254 can also include information used by intent classifier242 to determine a confidence score for each skill bot in the chatbotsystem, e.g., parameters of a machine-learning model.

FIG. 3 is a simplified block diagram of a skill bot system 300 accordingto certain embodiments. Skill bot system 300 is a computing system thatcan be implemented in software only, hardware only, or a combination ofhardware and software. In certain embodiments such as the embodimentdepicted in FIG. 1 , skill bot system 300 can be used to implement oneor more skill bots within a digital assistant.

Skill bot system 300 includes an MIS 310, an intent classifier 320, anda conversation manager 330. The MIS 310 is analogous to the MIS 220 inFIG. 2 and provides similar functionality, including being operable todetermine, using rules 352 in a data store 350: (1) whether an utterancerepresents multiple intents and, if so, (2) how to split the utteranceinto a separate utterance for each intent of the multiple intents. Incertain embodiments, the rules applied by MIS 310 for detecting multipleintents and for splitting an utterance are the same as those applied byMIS 220. The MIS 310 receives an utterance 302 and extracted information304. The extracted information 304 is analogous to the extractedinformation 205 in FIG. 1 and can be generated using the language parser214 or a language parser local to the skill bot system 300.

Intent classifier 320 can be trained in a similar manner to the intentclassifier 242 discussed above in connection with the embodiment of FIG.2 and as described in further detail herein. For instance, in certainembodiments, the intent classifier 320 is implemented using amachine-learning model. The machine-learning model of the intentclassifier 320 is trained for a particular skill bot, using at least asubset of example utterances associated with that particular skill botas training utterances. The ground truth for each training utterancewould be the particular bot intent associated with the trainingutterance.

The utterance 302 can be received directly from the user or suppliedthrough a master bot. When the utterance 302 is supplied through amaster bot, e.g., as a result of processing through MIS 220 and EIS 230in the embodiment depicted in FIG. 2 , the MIS 310 can be bypassed so asto avoid repeating processing already performed by MIS 220. However, ifthe utterance 302 is received directly from the user, e.g., during aconversation that occurs after routing to a skill bot, then MIS 310 canprocess the utterance 302 to determine whether the utterance 302represents multiple intents. If so, then MIS 310 applies one or morerules to split the utterance 302 into a separate utterance for eachintent, e.g., an utterance “D” 306 and an utterance “E” 308. Ifutterance 302 does not represent multiple intents, then MIS 310 forwardsthe utterance 302 to intent classifier 320 for intent classification andwithout splitting the utterance 302.

Intent classifier 320 is configured to match a received utterance (e.g.,utterance 306 or 308) to an intent associated with skill bot system 300.As explained above, a skill bot can be configured with one or moreintents, each intent including at least one example utterance that isassociated with the intent and used for training a classifier. In theembodiment of FIG. 2 , the intent classifier 242 of the master botsystem 200 is trained to determine confidence scores for individualskill bots and confidence scores for system intents. Similarly, intentclassifier 320 can be trained to determine a confidence score for eachintent associated with the skill bot system 300. Whereas theclassification performed by intent classifier 242 is at the bot level,the classification performed by intent classifier 320 is at the intentlevel and therefore finer grained. The intent classifier 320 has accessto intents information 354. The intents information 354 includes, foreach intent associated with the skill bot system 300, a list ofutterances that are representative of and illustrate the meaning of theintent and are typically associated with a task performable by thatintent. The intents information 354 can further include parametersproduced as a result of training on this list of utterances.

Conversation manager 330 receives, as an output of intent classifier320, an indication 322 of a particular intent, identified by the intentclassifier 320, as best matching the utterance that was input to theintent classifier 320. In some instances, the intent classifier 320 isunable to determine any match. For example, the confidence scorescomputed by the intent classifier 320 could fall below a thresholdconfidence score value if the utterance is directed to a system intentor an intent of a different skill bot. When this occurs, the skill botsystem 300 may refer the utterance to the master bot for handling, e.g.,to route to a different skill bot. However, if the intent classifier 320is successful in identifying an intent within the skill bot, then theconversation manager 330 will initiate a conversation with the user.

The conversation initiated by the conversation manager 330 is aconversation specific to the intent identified by the intent classifier320. For instance, the conversation manager 330 may be implemented usinga state machine configured to execute a dialog flow for the identifiedintent. The state machine can include a default starting state (e.g.,for when the intent is invoked without any additional input) and one ormore additional states, where each state has associated with it actionsto be performed by the skill bot (e.g., executing a purchasetransaction) and/or dialog (e.g., questions, responses) to be presentedto the user. Thus, the conversation manager 330 can determine anaction/dialog 335 upon receiving the indication 322 identifying theintent and can determine additional actions or dialog in response tosubsequent utterances received during the conversation.

Data store 350 comprises one or more computing devices that store dataused by the various subsystems of the skill bot system 300. As depictedin FIG. 3 , the data store 350 includes the rules 352 and the intentsinformation 354. In certain embodiments, data store 350 can beintegrated into a data store of a master bot or digital assistant, e.g.,the data store 250 in FIG. 2 .

Data Manufacturing Frameworks for Synthesizing Synthetic Training Data

Building a deep learning model that can transform a user's naturallanguage query into a SQL query that matches the user's intent requiresan enormous amount of training data. Additionally, building such a modelthat can generalize well to new and unseen databases is difficultbecause training data is usually domain-specific (i.e., designed for oneparticular database and/or application). Conventional approaches haveassumed that large amounts of manually curated (e.g., usingcrowdsourcing) non-domain-specific training data is and/or will beavailable. However, gathering and curating such data is a substantialundertaking that requires a significant amount of time, effort, andcosts. To overcome these challenges and others, data manufacturingframeworks are described herein for synthesizing synthetic training datato facilitate training a natural language to logical form model (e.g.,an NL2SQL model).

In some instances, the natural language to logical form model (alsodescribed herein as a semantic parsing model) trained with the synthetictraining data synthesized using the techniques described herein can beimplemented in a chatbot system, as described with respect to FIGS. 1, 2and 3 . Nonetheless, while the data manufacturing frameworks aredescribed in various instances herein with particular reference tonatural language to logical form (such as SQL) and/or a chatbot system,it should be understood that these frameworks are applicable for othersemantic parsing models (e.g., natural language to Python, Java, etc.)and/or artificial-intelligence based systems where a developer/user isinterested in understanding a user's natural language utterance.Furthermore, herein, SQL queries are provided as examples of logicalforms corresponding to utterances in training data; however, any kind oflogical form may be included in the training data, according to variousembodiments.

Framework Based on Templates and a Synchronous Context-Free Grammar

Synthetic training data that includes natural language (NL) utterancesand corresponding SQL queries can be generated under a datamanufacturing framework based on templates and a synchronouscontext-free grammar (SCFG). The data manufacturing framework cangenerate the synthetic training data by accessing original training datathat includes NL utterances and corresponding SQL queries, generatingtemplates and learning a SCFG from the original training data,generating lexicalized training data (i.e., the synthetic training data)by lexicalizing the templates. In order to lexicalize the templates,templates can be parsed and analyzed based on the SCFG, a relationaldatabase can be analyzed and sampled, the parsed templates can bepopulated with sampled database components to generate lexicalizedtraining examples, and the lexicalized training examples can bevalidated to output the lexicalized training data. The original trainingdata can be combined with the synthetic training data to form updatedtraining data. In some instances, prior to combining the originaltraining data and the synthetic training data, the lexicalized trainingdata can be paraphrased to form paraphrased lexicalized training datawhich can be combined with the original training data to form theupdated training data.

FIG. 4A is a simplified logical flow diagram 4100 of an example processfor generating training data under a data manufacturing framework basedon templates and a SCFG. The flow starts with accessing training data4102 (i.e., original training data). The training data 4102 can beobtained from one or more sources such as a database (not shown). Thetraining data 4102 can include utterances and their corresponding SQLqueries. For example, the training data 4102 can include the utterance“What is the average life expectancy in the United States of America?”and its corresponding SQL query “SELECT AVG(life_expectancy) FROM T1WHERE country=‘United States of America’”. In some instances, eachutterance can be labeled with a label indicating that it is anutterance, and its corresponding SQL query can be labeled with a labelindicating that it is a SQL query and the utterance it corresponds to.For example, the utterance “How many square miles is the United Statesof America?” can be labeled with the label “first utterance” and itscorresponding SQL “SELECT size FROM T1 WHERE country=‘United States ofAmerica’” can be labeled with the label “first SQL query, firstutterance”.

In some instances, the utterances and their corresponding SQL queriescan pertain to a part or parts of a conversation (e.g., a conversationbetween users and/or between a user and a machine such as the chatbotdescribed above). In some instances, the utterances and theircorresponding SQL queries can be non-follow-up utterances andcorresponding non-follow-up SQL queries and/or follow-up utterances andcorresponding follow-up SQL queries. Non-follow-up utterances andcorresponding SQL queries refers to initial utterances and correspondingSQL queries in sequences of utterances and corresponding SQL queries,and follow-up utterances and corresponding SQL queries refers tosubsequent utterances and corresponding SQL queries in the sequences ofutterances and corresponding SQL queries. In some instances, theutterances and their corresponding SQL queries can pertain to a singlerelational database or domain and/or multiple relational databases ordomains.

The training data 4102 can also include database schema information. Adatabase schema defines how data is organized within a database such asa relational database; this includes logical constraints such as tablenames, fields, data types, and the relationships between these entities.A relational database can be formed of one or more tables with eachtable of the one or more tables including one or more columns with eachcolumn of the one or more columns including one or more values. Eachtable and column of a relational database can be named with uniqueidentifiers, each of which can include one or more words. In someinstances, one or more columns of the relational database may serve as aprimary key in which each of the values of the one or more columns thatserve as the primary key are unique from each other. In some instances,one or more columns of the relational database may serve as a foreignkey which serves to the link the table which includes the one or morecolumns with another table in the relational database. In someinstances, a table or column that does not reference another table orcolumn can be considered a terminal table or column and a table orcolumn that references another table or column can be a non-terminaltable or column. In some instances, the database schema informationincludes one or more data structures for storing the unique identifiersof the one or more tables, the unique identifiers of the one or morecolumns, and values of each relational database. The unique identifiersand values can be stored in one or more vectors and/or matrices. In someembodiments, a data structure storing schema information for arelational database can store a directed graph representing the uniqueidentifiers and values.

In some instances, the training data 4102 can include utterances andtheir corresponding SQL queries obtained from one or more publicdatasets such as the Spider, SParC, and/or CoSQL datasets. Additionalinformation for the Spider dataset is found in “Spider: A Large-ScaleHuman-Labeled Dataset for Complex and Cross-Domain Semantic Parsing andText-to-SQL Task” by Yu et al., published in Proceedings of the 2018Conference on Empirical Methods in Natural Language Processing 2018, theentire contents of which are hereby incorporated by reference as iffully set forth herein. Additional information for the SParC dataset isfound in “SParC: Cross-Domain Semantic Parsing in Context” by Yu et al.,published in Proceedings of the Association for ComputationalLinguistics 2019, the entire contents of which are hereby incorporatedby reference as if fully set forth herein. Additional information forthe CoSQL dataset is found in “CoSQL: A Conversational Text-to-SQLChallenge Towards Cross-Domain Natural Language Interfaces to Databases”by Yu et al., published in Proceedings of the 2019 Conference onEmpirical Methods in Natural Language Processing and the 9thInternational Joint Conference on Natural Language Processing, theentire contents of which are hereby incorporated by reference as iffully set forth herein.

Upon accessing the training data 4102, an SCFG 4106 can be learned fromthe training data 4102 at SCFG learning stage 4104 and templates 4110can be generated at template generation stage 4108. In some instances,at SCFG learning stage 4104, the SCFG 4106 can be learned based on theutterances and database schema information included in the training data4102. An SCFG refers to a grammar based on a finite set of synchronousrules where each synchronous rule has the general form [A₁→α₁, A₂→α₂],where A₁, A₂ are non-terminals and α₁, α₂ are synchronous strings (i.e.,a bijection exists between the occurrences of non-terminals in α₁ andthe occurrences of non-terminals in α₂, and that this bijection isexplicitly provided by the synchronous rule). Additional information forSCFGs can be found in “An Introduction to Synchronous Grammars” byChiang, published in Part of a tutorial given at ACL, the entirecontents of which are hereby incorporated by reference as if fully setforth herein. In some instances, the SCFG 4106 can be learned by settingthe table names, column names, and values in the database schemainformation as non-terminal symbols, setting SQL operators (e.g., Max,Min, =, Like, etc.) as non-terminal symbols, setting SQL functions(e.g., Average, Count, First, Last, etc.) as non-terminal symbols, andgenerating production rules by replacing entities/phrases in theutterances in the training data 4102 with the set of non-terminalsymbols. Additional information for learning an SCFG for SQL can befound in “Grammar-based Neural Text-to-SQL Generation” by Lin et al.,published in arXiv, and “Grappa: Grammar-Augmented Pre-Training ForTable Semantic Parsing” by Yu et al., published in ICLR, the entirecontents of which are hereby incorporated by reference as if fully setforth herein.

At template generation stage 4108, templates 4110 can be generated fromthe utterances and their corresponding SQL queries in the training data4102. A template includes a delexicalized utterance and itscorresponding delexicalized SQL query. A delexicalized utterance as usedherein refers to an utterance in which entities in the utterance thatcorrespond to the names of the tables, columns, and/or values of one ormore relational databases are respectively replaced with non-terminalsymbols that represent whether the respective entity corresponds to atable, column, or value. For example, the utterance “How many stormsoccurred in each region?” can be delexicalized into the delexicalizedutterance “How many TABLE#2 occurred in each TABLE#1.COLUMN#0?” byreplacing the terms “storms” and “regions,” which respectivelycorrespond to a table name and column name in an exemplary databaseschema, with corresponding non-terminal symbols. A delexicalized SQLquery as used herein refers to a SQL query in which entities in the SQLquery that correspond to the names of the tables, columns, and/or valuesof one or more relational databases are respectively replaced withnon-terminal symbols that represent whether the respective entitycorresponds to a table, column, or value. For example, delexicalizationof the SQL query “SELECT T.region name, count (*) FROM region AS T1 JOINaffected_region AS T2 ON T1.region_id=T2.region_id GROUP BYT1.region_id” yields the delexicalized SQL query “SELECTTABLE#0.COLUMN#0, Count (*) FROM TABLE#0 JOIN TABLE#1 ONTABLE#0.COLUMN#1=TABLE#1.COLUMN#0 GROUP BY TABLE#0.COLUMN#1.”

In some instances, at template generation stage 4108, templates 4110 canbe generated automatically from the utterances and their correspondingSQL queries in the training data 4102 using a trained machine learningmodel. In some instances, the machine learning model is trained toperform approximate string matching. In some instances, the trainedmachine learning model can predict which words in a respective utterancecorrespond to table names, table column names, and column values in thedatabase schema information included in the training data 4102 andreplace those words with the non-terminal symbols. For example, asdepicted in the examples above, predicted table names in the respectiveutterance can be replaced with the non-terminal symbol TABLE, predictedcolumn names in the respective utterance can be replaced with thenon-terminal symbol COLUMN, and predicted values in the respectiveutterance can be replaced with the non-terminal symbol VALUE. In someinstances, the non-terminal symbols TABLE, COLUMN, and VALUE can includean index representing a table number or a column number and a hashtagsymbol that separates the non-terminal symbol from its index (e.g.,TABLE#0.COLUMN#0, TABLE#0). In some instances, the trained machinelearning model can include one or more neural networks trained toperform approximate string matching with training data that includesutterances labeled with database schema information. In some instances,templates 4110 generated using a trained machine learning model may notgeneralize well across databases and may produce inconsistentdelexicalized parts in both the utterance and the corresponding SQLquery.

Alternatively, or additionally, in some instances, at templategeneration stage 4108, templates 4110 which can generalize well acrossdatabases and include consistent delexicalized parts in both theutterance and the corresponding SQL query can be generated by a userbased on a rules scheme. In some instances, in order to generate atemplate, the user can apply the following rule scheme: 1. non-terminalsin the utterance and its corresponding SQL query in the template shouldbe equal (i.e., there should not be a hanging non-terminal in eitherside); 2. there should not be any lexicalized entities in the templateutterance that can cannot generalize well across databases; 3.non-terminals are defined as elements of either table, column, or valueof a database schema with an enumeration starting from 0 (i.e., table,column, and value which do not reference another table, column, orvalue), all column and values should be delexicalized, and a tableshould be delexicalized only if it is mentioned in the utterance but acolumn of that table does not appear anywhere in the utterance; 4. allparts of the utterance should be kept if feasible (e.g., do not remove apart of an utterances that corresponds to a SQL operator, but removeverbs of the utterance that hinder generalization of the template); 5.for any identified non-terminal symbol, replace the identifiednon-terminal symbol with the respective non-terminal symbol in allcapital letters followed by a hashtag symbol that separates thenon-terminal from its index (e.g., TABLE#0.COLUMN#0, TABLE#0); and 6.check the consistencies of the template and retain any templates thatare consistent and discard any templates that are inconsistent.

Upon generating the templates 4110 and the SCFG 4106, at lexicalizationstage 4112, the templates 4110 can be lexicalized to generatelexicalized training data 4114. In some instances, as shown in FIG. 4B,at the lexicalization stage 4200, the templates 4110 can be lexicalizedusing the SCFG 4106. A lexicalized utterance as used herein refers to adelexicalized utterance in which non-terminal symbols of the utteranceare replaced with sampled components of a database. For example, thedelexicalized utterance “Show all TABLE#0.COLUMN#0 with at least 3TABLE#0” can be lexicalized into the lexicalized utterance “Show allplayer attributes shot power with at least 3 player attributes” because“player attributes” is a table in the database and “shot power” is acolumn in that table. Similarly, a lexicalized SQL query as used hereinrefers to a delexicalized SQL query in which the non-terminal symbolsare replaced with sampled components of a database. For example, thedelexicalized SQL query “SELECT TABLE#0.COLUMN#0 FROM TABLE#0 GROUP BYTABLE#0.COLUMN#0 HAVING Count (*)>=3” can be lexicalized into thelexicalized SQL query “SELECT Player_Attributes.shot_power FROMPlayer_Attributes GROUP BY Player_Attributes.shot_power HAVINGCount(*)>=3.0.”

In some instances, in order to generate the lexicalized training data4114, at template parsing stage 4204, the templates 4110 are parsed toproduced parsed templates 4206. In some instances, each parsed template4206 includes a parsed delexicalized utterance and its correspondingparsed delexicalized SQL query. In some instances, the templates 4110can be parsed using a parsing algorithm and the SCFG 4106. In someinstances, the parsing algorithm can apply the SCFG 4106 to eachdelexicalized utterance and its corresponding delexicalized SQL query togenerate an abstract syntax tree (AST) for each parsed delexicalizedutterance and its corresponding parsed delexicalized SQL query in whichtheir respective logical syntactic components are identified andrepresented in the AST. An AST as used herein refers to a treerepresentation of the abstract syntactic structure of the delexicalizedutterances and their corresponding delexicalized SQL queries. In someinstances, the ASTs can be structured in the Zephyr Abstract SyntaxDescription Language format. Additional information for the ZephyrAbstract Syntax Description Language format can be found in “The ZephyrAbstract Syntax Description Language” by Wang et al., published InProceedings of the Conference on Domain-Specific Languages on Conferenceon Domain-Specific Languages (DSL), the entire contents of which arehereby incorporated by reference as if fully set forth herein. Forexample, for the delexicalized SQL query “SELECT TABLE#0.COLUMN#0 FROMTABLE#0 WHERE TABLE#0.COLUMN#1=‘VALUE#0’”, the following AST can begenerated:

{  “_type”: “sql”,  “select”: {   “_type”: “select”,   “is_distinct”:false,   “aggs”: [    {     “_type”: “agg”,     “agg_id”: {     “_type”: “NoneAggOp”     },     “val_unit”: {      “_type”:“Column”,      “col_unit1”: {       “_type”: “col_unit”,       “agg_id”:{        “_type”: “NoneAggOp”       },       “is_distinct”: false,      “col_id”: “_—table#0.column#0_—”      }     }    }   ]  }, “where”: {   “_type”: “Eq”,   “val_unit”: {    “_type”: “Column”,   “col_unit1”: {     “_type”: “col_unit”,     “agg_id”: {      “_type”:“NoneAggOp”     },     “is_distinct”: false,     “col_id”:“_—table#0.column#1_—”    }   },   “val 1”: {    “_type”: “String”,   “s”: “\”value#0\”“   }  } }

In some instances, in order to generate the lexicalized training data4114, at template analysis stage 4208, the constrained sampling stage4212, and lexicalization stage 4220, the templates 4110 are analyzed toidentify one or more constraints in each template, a database 4216 isanalyzed to identify its components, database components are sampledbased on the identified one or more constraints in each template, andthe non-terminal symbols in each delexicalized utterance and itscorresponding delexicalized SQL query are replaced with the sampledcomponents. In some instances, at template analysis stage 4208, eachdelexicalized utterance and its corresponding delexicalized SQL query ofthe templates 4110 are analyzed to identify each non-terminal symbol inthe delexicalized utterance and the delexicalized SQL query. In someinstances, each COLUMN non-terminal symbol is mapped to one a pluralityof COLUMN types and a constraint of a plurality of constraints isdefined for each COLUMN non-terminal symbol based on its respectiveCOLUMN type. In some instances, each VALUE non-terminal symbol is mappedto a corresponding COLUMN non-terminal symbol. In some instances, theplurality of COLUMN types include a TEXT type column in which the valuesof the column are text, a NUMBER type column in which the values of thecolumn are numbers, a BOOLEAN type column in which the values of thecolumn are Boolean operators, a TIME type column in which the values ofthe column are timestamps, a CURRENCY type column in which the values ofthe column are currency values, an AGE type column in which the valuesof the column are ages. In some instances, the plurality of constraintsincludes an Aggregateable, a Requestable, a Sortable, a Comparable, anOrderable, a Groupable, and a Filterable constraint. The foregoingcolumn types and constraints are not intended to be limiting and othercolumn types and constraints are possible (see below). In someinstances, each COLUMN non-terminal symbol is mapped to one of aplurality of COLUMN types matching the characters of the non-terminalsymbol to the characters of the plurality of COLUMN types (e.g., viastring-matching) and determining whether a VALUE non-terminal symbolmatches the COLUMN type. For example, a COLUMN#0.NUMBER non-terminalsymbol can be string-matched to a NUMBER type and the type can bevalidated by determining whether a VALUE non-terminal symbol correspondsto a number, which is associated with the COLUMN#0 non-terminal symbol,corresponds to a number. The following is an example of a mapping thatincludes a string-matching name check and a validation check:

Data Type Type Name Check Validation Check Meta Type TEXT — — NUMBER —is_numeric( ) BOOLEAN — true/false/yes/no/t/f/y/n TIME yes — OTHERS — —Custom Type CURRENCY yes — AGE yes is_numeric( ) value_within(1, 150)ENUM no num_rows >= 20 distinct value <= 5 DATE yes — MEASURE_BYTES yesis_numeric( ) MEASURE_WEIGHT yes is_numeric( ) MEASURE_LENGTH yesis_numeric( ) MEASURE_SURFACE yes is_numeric( ) MEASURE_VOLUME yesis_numeric( ) MEASURE_DURATION yes is_numeric( ) PHONE_NUMBER — allsamples match phone number (e.g., using regular expressions) LOCATIONyes — EMAIL_ADDRESS yes all samples match email (e.g., using regularexpressions) URL_ADDRESS — all samples match url (e.g., regularexpressions) IP_ADDRESS — all samples match IP address (e.g., usingregular expressions) ID yes —

In some instances, at constrained sampling stage 4212, components aresampled from the database 4216 based on a database analysis 4218 of thedatabase 4216 and the analyzed templates 4210. In some instances, thedatabase 4216 is one or more relational databases with each databasehaving components (e.g., tables, columns, and values). In someinstances, a database analysis 4218 is performed on the one or moredatabases 4216 to identify its components. In some instances, the one ormore databases 4216 and its components can be obtained from one or moresources such a user or the WikiSQL dataset. Additional information isfound in “Seq2sql: Generating structured queries from natural languageusing reinforcement learning” by Zhong et al., published in CoRR, theentire contents of which are hereby incorporated by reference as iffully set forth herein.

In some instances, components 4214 are sampled from the database 4216based on the analysis of the non-terminal symbols in the analyzedtemplates 4210 at the template analysis stage 4208. In some instances,prior to sampling components, each database in the one or more databases4216 can be analyzed at database analysis stage 4218 to identify thecomponents (e.g., tables and their names, columns and their names, andthe values in column) of the respective database. In some instances,database analysis stage 4218 is performed for a database based on schemainformation for the database. In some instances, in order to samplecomponents for a respective analyzed template of the templates 4210, adatabase of the one or more databases 4216 can be randomly selected andits components can be sampled and used to lexicalize the respectiveanalyzed template. In some instances, for each respective analyzedtemplate of the templates 4210, a table name is sampled from theselected database 4216 for each TABLE non-terminal symbol in therespective analyzed template 4210, a column name is sampled from thesampled table for each COLUMN non-terminal symbol in the respectiveanalyzed template 4210, and a value is sampled from the sampled columnfor each VALUE non-terminal symbol in the respective analyzed template4210. In some instances, for TABLE and COLUMN non-terminal symbolshaving an index (described), the table and column names can be sampledbased on the index value. For example, for a database having two tablesand five columns in each table and a TABLE#1.COLUMN#2 non-terminalsymbol, the first table and second column in that table can be sampledfrom the database.

In some instances, the table name can be randomly sampled from thedatabase 4216. For example, a table name can be randomly selected fromthe database 4216. In some instances, the table name can be sampled fromthe database 4216 based on the index of a non-terminal symbol (e.g., atable with an index of 1 in the database can be sampled for a TABLE#1non-terminal symbol.) In some instances, the column name is sampledbased on the table to which it belongs, the type of column it is, andits constraints. For example, a column name in the database belonging toa particular table and having numbers as values and requestable as aconstraint will be sampled for COLUMN non-terminal symbols having a typeNUMBERS and a Requestable constraint. In some instances, when sampling acolumn name for the utterance, any spaces in the column can be replacedwith “_” and any periods (e.g., “.”) can be removed, but, when samplinga column name for the SQL query, the column name is not changed. In someinstances, the table names and the column names sampled for non-terminalsymbols in the utterance should be the same for the same non-terminalsymbols in the SQL query. In other words, there should be a one-to-onemapping between the sampled components for non-terminal symbols in theutterance and the sampled components for the same non-terminal symbolsin the SQL query. In some instances, values are sampled by determiningwhether the VALUE non-terminal symbol is associated with an operator(e.g., <, <=, >, >=, LIKE, etc.) and sampling values from thecorresponding column is performed based on the associated operator. Forless than and greater than type operators, values are randomly sampledbetween the minimum values and maximum of the corresponding column. Forthe LIKE and other operator, values are randomly sampled and thecharacters at the start, end, and/or a combination of both the start andend of the sampled values are replaced with a “%” character. In someinstances, determining whether to replace the starting characters, theending characters, and/or the both the starting and ending characterswith a “%” character can be determined randomly and the number ofcharacters to replace can be random.

In some instances, at lexicalization stage 4220, each parsed template ofthe parsed templates 4206 can be lexicalized with the sampled components4214 for the respective parsed template to produce lexicalized trainingexamples 4222 (i.e., the synthetic training examples). In someinstances, a parsed utterance and its corresponding parsed SQL query inparsed templates 4206 can be lexicalized by replacing the non-terminalsymbols of the utterance and its corresponding SQL query with thecomponents 4214 of the selected database 4216 sampled for the respectiveparsed utterance and SQL query.

In some instances, at validation stage 4224, each lexicalized trainingexample of lexicalized training examples 4222 can be validated andlexicalized training examples that are valid can be included in thelexicalized training data 4114 (i.e., the synthetic training data) andlexicalized training examples that are not valid can be discarded (i.e.,the discarded training examples 4226). In some instances, in order tovalidate each lexicalized training example, a constraint check can beperformed on each lexicalized training example. In some instances, theconstraint check is performed by executing the respective lexicalizedSQL query against a database in the one or more databases 4216. In someinstances, the database can be randomly selected from the one or moredatabases 4216. In some instances, the database can be database 4216. Insome instances, if execution of the lexicalized SQL query against adatabase returns an empty result, the lexicalized SQL query can bediscarded. In some instances, if execution of the lexicalized SQL queryreturns a non-empty result, the WHERE clauses in the SQL query aretreated as true and the lexicalized SQL query is again executed againstthe database. In some instances, if execution of the lexicalized SQLquery with its WHERE clauses treated as true returns a non-empty result,the lexicalized SQL query can be included in the lexicalized trainingdata 4114; otherwise, the lexicalized SQL query can be discarded.

In some instances, the constraint check is performed by determiningwhether the sampled columns of the sampled components 4214 used tolexicalize the respective lexicalized utterance and its correspondingSQL query are consistent with the type of the COLUMN non-terminal symbolin the delexicalized utterance and delexicalized SQL query prior tolexicalization. For example, a sampled component COLUMN.TEXT should besampled for a COLUMN.NUMBER non-terminal symbol in the delexicalizedutterance and its corresponding delexicalized SQL query. In someinstances, the constraint check is performed by determining whether thesampled columns of the sampled components 4214 satisfy one or moreconditions based on their respective type and constraints. For example,a column with a COLUMN.NUMBER type with an Aggregatable constraintshould be located in the lexicalized SQL query inside a MAX, MIN, AVG,and COUNT SQL operation. In some instances, an exemplary column types,constraints, and conditions are listed in the table below:

TypeConstraint Eligible Types Condition Aggregateable Number Column isinside a MAX, MIN, AVG, Time COUNT operation Currency Age MeasureRequestable All Column is requested inside SELECT clause Sortable Sameas Column is used inside these operators aggregateable (<, >, <=, >=,BETWEEN) Comparable All except ID Column is used inside equal and notequal operators Orderable All Column is used in order by clauseGroupable All Column is used in group by clause Filterable All Column isused in WHERE clause

In some instances, a data structure can be generated and the lexicalizedtraining data 4114 can be organized and stored in the data structure. Insome instances, each lexicalized utterance and its correspondinglexicalized SQL in the lexicalized training data 4114 can be organizedin a predetermined format. In some instances, an exemplary predeterminedformat is as follows:

{  “nl_lexicalized”: “Show all player attributes shot power with atleast 3 player attributes.”,  “sql_lexicalized”: “SELECTPlayer_Attributes.shot_power FROM Player_Attributes GROUP BYPlayer_Attributes.shot_power HAVING Count(*) >= 3.0”,  “template”: {  “nl”: “Show all TABLE#0.COLUMN#0 with at least 3 TABLE#0 .”,   “sql”:“SELECT TABLE#0.COLUMN#0 FROM TABLE#0 GROUP  BY TABLE#0.COLUMN#0 HAVINGCount ( * ) >= 3”  },  “mapping”: {   “nl_map”: {    “TABLE#0.COLUMN#0”:“player attributes shot power”,    “TABLE#0”: “player attributes”   },  “sql_map”: {    “TABLE#0”: “Player_Attributes”,    “TABLE#0.COLUMN#0”:“shot_power”   }  },  “col_type_constraint”: {   “TABLE#0.COLUMN#0”: [   “SqlTypeConstraint.REQUESTABLE”,    “SqlTypeConstraint.GROUPABLE”   ] },  “col_type_assignment”: {   “TABLE#0.COLUMN#0”:“SqlColumnTypes.NUMBER”  },  “value_bound”: { },  “column_bound”: { }, }

Returning to the discussion of FIG. 4A, upon generating the lexicalizedtraining data 4114 at lexicalization stage 4112, the lexicalizedtraining data 4114 can be optionally paraphrased at paraphrasing step4116 to produce paraphrased lexicalized training data 4118. In someinstances, the lexicalized training data 4114 can be clunky. Forexample, in some instances, some delexicalized utterances in thetemplates 4110 can include nonterminal symbols corresponding to multipledifferent columns from the same table, which results in a delexicalizedutterance with the table name included in multiple instances. In someinstances, clunkiness can be reduced by defining a rule to reduce thenumber of table name mentions in the lexicalized utterance. For example,a rule may be defined at paraphrasing step 4116 to only allow onemention of table name at the first instance of a COLUMN non-terminalsymbol in the utterance from the start of the utterance to the end ofthe utterance and for any COLUMN non-terminal symbols that appear in theutterance after the first instance, it is assumed that that COLUMN isfrom the same table as the first instance and an “s” can be added to theCOLUMN non-terminal symbol (e.g., COLUMN's). Additionally, if thatparticular table appears in the NL examples as a single table mentioned,thus the delexicalized columns that come from that table do not need tomention the table name at all. In some instances, clunkiness can bereduced by sending the lexicalized training data 4114 to a crowd workerfor paraphrasing.

In some instances, the lexicalized training data 4114 and/or theparaphrased lexicalized training data 4118 can be combined with thetraining data 4102 to produce updated training data 4120. In someinstances, the updated training data 4120 can be used to train one ormore natural language to logical form (NL-LF) algorithms such as theNL-LF algorithm(s) 4918 described below with respect to FIG. 4F.

Using the foregoing data manufacturing framework based on templates anda SCFG, additional training data can be generated without the time,effort, and money required to gather and clean data under theconventional approaches. Additionally, using the foregoing datamanufacturing framework based on templates and a SCFG, models thatperform well on select databases and that can generalize well to new andunseen databases can be built.

Framework Based on a Probabilistic Context-Free Grammar and Translator

Synthetic training data that includes NL utterances and correspondingSQL queries can be generated under a data manufacturing framework basedon a probabilistic context-free grammar (PCFG) and a translator. Thedata manufacturing framework can generate synthetic training data byaccessing training data that includes NL utterances and correspondingSQL queries, finetuning a translator with the training data,delexicalizing the SQL queries in the training data, generatingadditional SQL queries using a PCFG, generating lexicalized SQL queriesby lexicalizing the delexicalized SQL queries and the additional SQLqueries, generating NL utterances from the lexicalized SQL queries usingthe translator, forming generated training data (i.e., the synthetictraining data) with the generated NL utterances and the lexicalized SQLqueries. The original training data can be combined with the synthetictraining data to form updated training data. In some instances, prior tocombining the training data and the synthetic training data, thegenerated training data can be paraphrased to form paraphrased generatedtraining data which can be combined with the original training data toform the updated training data.

FIG. 4C is a simplified logical flow diagram 4300 of an example processfor generating training data under a data manufacturing framework basedon a PCFG and a translator. The flow starts with accessing the trainingdata 4102 (i.e., the original training data). The training data 4102 hasbeen described above.

Upon accessing the training data 4102, a pre-trained model can befinetuned to form an NL translator 43042 with the training data 4102 attranslator finetuning stage 4304. In some instances, the NL translator43042 can translate SQL queries into utterances. For example, the SQLquery “SELECT Count(*) FROM singer” can be translated into the utterance“How many singers are there?” In some instances, the pre-trained modelis configured to perform a translation task (e.g., a text-to-texttranslation task, a string-to-string translation task, and the like). Insome instances, the pre-trained model is trained with a set of trainingutterances, including a set of source utterances and a set of targetutterances. An example of a pretrained model that is configured toperform a translation task is the Text-To-Text Transfer Transformer(T5). Additional information for the T5 translator is found in“Exploring the limits of transfer learning with a unified text-to-texttransformer” by Raffel et al., published in The Journal of MachineLearning Research, the entire contents of which are hereby incorporatedby reference as if fully set forth herein.

In some instances, pre-trained model can be finetuned to form NLtranslator 43042 using transfer learning. In some instances, the weightsand parameters of the pre-trained model can be adjusted based on thetraining data 4102 using machine learning optimization techniques (e.g.,AdamW). In some instances, the logical forms in the training data 4102can be set as a source language and the utterances in the training data4102 can be set as a target language. In some instances, using thepre-trained model, a loss/error can be computed between predictions ofthe utterances by the pre-trained model and utterances in the trainingdata 4102, the loss/error can be used to compute gradients, and thegradients can be used to update the model weights and biases of thepre-trained model. In this way, the pre-trained model, which ispre-trained to perform a translation task, can be finetuned to form NLtranslator 43042 configured to translate SQL queries to natural languageutterances.

At SQL delexicalization and generation stage 4306, a delexicalized SQLquery set 4308 can be generated by delexicalizing the SQL queries in thetraining data 4102 and generating additional delexicalized SQL queriesfrom the delexicalized SQL queries. Delexicalizing SQL queries has beendescribed above and is not repeated here. In some instances, additionaldelexicalized SQL queries can be generated from delexicalized SQLqueries by parsing the delexicalized SQL queries into ASTs (describedabove) and using a PCFG to generate the additional delexicalized SQLqueries from the ASTs. A PCFG to refers to a context-free grammar inwhich the sum of all probabilities of the production rules for the samenon-terminal symbols is equal to one. A PCFG includes terminal symbols(e.g. x¹, x¹, . . . x^(v)), nonterminal symbols (e.g., N¹, N², . . .N^(n)), a start symbol (e.g., N¹), a set of rules (e.g, N^(i)→β^(j),where β^(j) is a sequence of terminals and non-terminals), and ruleprobabilities (e.g., ∀_(i)Σ_(j)P(N^(i)→β^(j))=1). Given a PCFG, theprobability of a SQL query (e.g., including tokens w₁, w₂, . . . w_(m))is P(w_(1n))=Σ_(t)P(w_(1n), t), where t is the parse tree (AST) ofw_(1n). Additional information for generating SQL queries using a PCFGcan be found in “Learning to Synthesize Data for Semantic Parsing” byWang et al., published in arXiv preprint arXiv:2104.05827, the entirecontents of which are hereby incorporated by reference as if fully setforth herein.

At SQL lexicalization stage 4310, the delexicalized SQL query set 4308can be lexicalized to form a lexicalized SQL set 4312. Lexicalization ofSQL queries is described above and is not repeated here.

At translation stage 4314, using the NL translator 43042, thelexicalized SQL query set 4312 can be translated into NL utterances toform a NL utterance set and the NL utterance set can be combined withthe lexicalized SQL query set 4312 to form generated training data 4316(i.e., the synthetic training data). In some instances, each NLutterance of the NL utterance set can form a pair with the respectivelexicalized SQL query of the lexicalized SQL set 4312 used as input topredict the respective NL utterance. In this way, the generated trainingdata 4316 can include NL utterances and corresponding SQL queries thatare not in the training data 4102.

Upon generating the generated training data 4316 at translation stage4314, the generated training data 4316 can be optionally paraphrased atparaphrasing step 4318 to produce paraphrased generated training data4320. Paraphrasing is described above and not repeated here.

In some instances, the generated training data 4316 and/or theparaphrased generated training data 4320 can be combined with thetraining data 4102 to produce updated training data 4322. In someinstances, the updated training data 4322 can be used to train one ormore natural language to logical form (NL-LF) algorithms such as theNL-LF algorithm(s) 4918 described below with respect to FIG. 4F.

Using the foregoing data manufacturing framework based on a PCFG and atranslator, additional training data can be generated without the time,effort, and money required to gather and clean data under theconventional approaches. Additionally, using the foregoing datamanufacturing framework based on a PCFG and a translator, models thatperform well on select databases and that can generalize well to new andunseen databases can be built.

Framework Based on Tree-to-String Translation

Synthetic training data that includes NL utterances and correspondingSQL queries can be generated under a data manufacturing framework basedon tree-to-string translation. The data manufacturing framework cangenerate synthetic training data by accessing training data thatincludes NL utterances and corresponding SQL queries, parsing andnormalizing the SQL queries in the training data into ASTs,delexicalizing the SQL queries in the training data, generatinglexicalized SQL queries by lexicalizing the delexicalized SQL queries,generating NL utterances from the ASTs and lexicalized SQL queries usingtree-to-string translation, and forming generated training data (i.e.,the synthetic training data) with the generated NL utterances and thelexicalized SQL queries. The original training data can be combined withthe synthetic training data to form updated training data. In someinstances, prior to combining the training data and the synthetictraining data, the generated training data can be paraphrased to formparaphrased generated training data which can be combined with thetraining data to form the updated training data.

FIG. 4D is a simplified logical flow diagram 4400 of an example processfor generating training data under a data manufacturing framework basedon tree-to-string translation. The flow starts with accessing thetraining data 4102 (i.e., the original training data). The training data4102 has been described above.

Upon accessing the training data 4102, the SQL queries in the trainingdata 4102 can be parsed into ASTs (described above) and the ASTs can benormalized at SQL parsing and normalizing stage 4404. In some instances,in order to normalize the ASTs, an AST of the ASTs including a nodehaving more than two children can be binarized such that each node ofthe respective AST has no more than two children. For example, as shownin FIG. 4E, an AST 4500 including a node 4510 having more than twochildren can be binarized such that each node 4510 having more than twochildren can become one or more nodes 4520 having no more than twochildren. In some instances, in order to normalize the ASTs, a unarywrapping can be performed on each AST such that a unary head identifieris applied at each node of the respective AST that corresponds to anon-terminal. For example, as shown in FIG. 4E, in a unary wrappingprocess 4600, unary head identifiers 4610 can be applied to each node ofAST 4620 because each node corresponds to non-terminal. In someinstances, SQL queries that include one or more predefined clausesassociated with one or more conditions may not parse correctly (e.g.,the SQL AST does not include a child node for the one or more conditionswhen it should). Examples of such predefined clauses include a WHEREclause, a GROUP clause, an ORDERBY clause, an IEU (intersect, except,union) clause, and the like. For example, the SQL query “SELECT ColumnFROM Table WHERE one or more conditions” may parse to an AST in which achild node is not generated for each of the one or more conditions. Inthese cases, in order to normalize the AST, the AST will be normalizedto include one or more deletion nodes for each of the predefined clausesnot present in the SQL query such that the AST will have the requestedchild node and possible combinations thereof. For example, as shown inFIG. 4E, in a deletion node addition process 4700, the SQL query 4710,which includes the WHERE clause and a condition (i.e., the non-terminaltoken “50”) is represented by the AST 4720 in which deletion nodes 4730are added with an identifier (e.g., _DELETE_) for the other predefinedclauses (e.g., GROUPBY, ORDERBY, IEU, etc.).

At SQL delexicalization stage 4408, a delexicalized SQL query set 4410can be generated by delexicalizing the SQL queries in the training data4102. Delexicalizing SQL queries has been described above and is notrepeated here.

At SQL lexicalization stage 4412, the delexicalized SQL query set 4410can be lexicalized to form a lexicalized SQL set 4414. Lexicalization ofSQL queries is described above and is not repeated here.

At translation stage 4416, NL utterances can be generated for thelexicalized SQL queries in the lexicalized SQL query set 4414. In someinstances, an NL utterance can be generated for each lexicalized SQLquery in the lexicalized SQL query set 4414. In some instances, the NLutterances can be generated using a tree-to-string (TTS) model 44162. Insome instances, the TTS model 44162 generates the NL utterances based onlexicalized SQL query set 4414, the SQL ASTs 4406, and SQL Grammar 4418.In some instances, the NL utterances can be combined with thelexicalized SQL queries in the lexicalized SQL query set 4414 to formgenerated training data 4420 (i.e., the synthetic training data). Insome instances, each generated NL utterance can form a pair with therespective lexicalized SQL query of the lexicalized SQL set 4414 thatwas used by the TTS model 44162 to generate the respective NL utterance.In this way, the generated training data 4420 can include NL utterancesand corresponding SQL queries that are not in the training data 4102.

In some instances, the TTS model 44162 generates an NL utterance foreach lexicalized SQL query of lexicalized SQL set 4416 based on a SQLGrammar 4418. In some instances, SQL Grammar 4418 can include rules fortree transduction that define rules of source subtree transformationswhile also defining rules for synchronously generating output strings.In some instances, the SQL Grammar 4418 can include a plurality of SCFGrules, a plurality of tree transduction rules, and a plurality ofutterances that correspond to the plurality of SCFG rules and theplurality of tree transduction rules. For example, as shown in FIG. 4E,the SQL Grammar 4418 can include a plurality of SCFG rules 4800, aplurality of tree transduction rules 4820, and a plurality of utterances4840 that correspond to the plurality of SCFG rules 4800 and theplurality of tree transduction rules 4820. In some instances, theplurality of SCFG rules 4800 define the grammatical structure of the SQLqueries in the lexicalized SQL query set 4416. In some instances, eachtree transduction rule 4830 of the plurality of tree transduction rules4820 is a binarized version of a respective SCFG rule 4810 of theplurality SCFG rules 4800 and defines how a source tree (e.g., an AST)can be transformed into a target tree (e.g., an utterance tree). In someinstances, each utterance 4850 of the plurality of utterances 4840corresponds to a respective SCFG rule 4810 of the plurality of SCFGrules 4800 and a respective tree transduction rule 4830 of the pluralityof tree transduction rules 4820. In some instances, as shown in FIG. 4E,the plurality of SCFG rules 4800 is grouped by their head (i.e., thesymbol on the left of the arrow) where each SCFG rule 4810 has amatching source and target. In some instances, a rule may include anon-terminal. In some instances, the non-terminal can be a variable thatis be rewritten for the utterance 4850 based on a respective treetransduction rule of the plurality of learned tree transduction rules4820. In some instances, the plurality of SCFG rule 4800 can be learnedfrom the training data 4102 (described above). In other instances, theplurality of SCFG rules 4800 can be the Abstract Syntax DescriptionLanguage (ASDL) for the Spider database (described above). Additionalinformation for the Spider ASDL is found in “LGESQL: Line Graph EnhancedText-To-Sql Model with Mixed Local and Non-local Relations” by Ruishenget al., published in arXiv preprint arXiv:2106.01093, the entirecontents of which are hereby incorporated by reference as if fully setforth herein.

In some instances, the TTS model 44162 generates an NL utterance foreach lexicalized SQL query of lexicalized SQL set 4416 by (a) reorderingnodes of the AST for the respective lexicalized SQL query based on theplurality of SCFG rules 4800 and the plurality of tree transductionrules 4820 into a reordered AST for the respective lexicalized SQL queryand (b) decoding the reordered AST into an NL utterance for therespective lexicalized SQL query based on the respective lexicalized SQLquery. In some instances, in order to reorder the nodes of an AST, eachSCFG rule of the plurality of SCFG rules can be applied to each childnode of the respective AST to identify which nodes correspond to thenon-terminals of plurality of SCFG rules and each tree transduction ruleof the plurality of tree transduction rules can be applied to eachidentified node to reorder the nodes to match the structure defined bythe plurality of tree transduction rules. For example, an AST for a SQLquery can be arranged with a SELECT node representing a SELECT operatorin the SQL query in a first layer of the AST, a COLS node representingcolumn(s) in the SQL query in a leftmost branch of a second layer of theAST, a FROM node representing a FROM operator in the SQL query in amiddle branch of the second layer of the AST, and a WHERE noderepresenting a WHERE operator in the SQL query in a rightmost branch ofthe second layer of the AST. The AST can further include a third layerfor each node in the second layer and each node in the third layer canrepresent one or more targets of the nodes of the second layer. Forexample, the COLS node can include a child node in the third layer thatrepresents the column referenced in the SQL query, the FROM node caninclude a child node in the third layer that represents the tablereferenced in the SQL query, and the WHERE node can include a child nodein the third layer that represents a value in the column referenced inthe SQL query.

The AST can be reordered based on SCFG rules and tree transductionrules. In order words, a new AST can be generated in which one or morechild nodes of an existing branch can be moved to another branch and/ornew branch and/or an order of the branches can be rearranged (e.g., aleftmost branch becomes a rightmost branch). In some instances, the ASTcan be reordered by finding all the transduction rules that can apply toan existing tree. An SCFG rule and corresponding tree transduction ruleapplies when the head of the SCFG rule matches a node in the AST and thestate of the node matches the state of the SCFG rule. In some instances,when a portion of the AST matches an SCFG rule, the corresponding treetransduction rule is applied, and a new AST subtree is generated andadded to the AST by replacing the branch (or subtree) that matched theSCFG rule with the right-hand side of the SCFG rule as structured by thetree transduction rule. A complete tree transduction begins with theroot node being in the initial state (e.g., the SELECT node) thencontinues with states beneath the root node by propagating down the treeto the leaves until the entire tree has been transduced.

In some instances, the TTS model 44162 can reorder the AST prior todecoding the reordered AST. In other instances, the AST can be reorderedand decoded concurrently. In some instances, the TTS model 44612 can betrained with a dataset that includes SQL queries, translated NLutterances for those SQL queries, a tree transduction rule set, and anindex that associates one or more tree transduction rules of the treetransduction rule set with each SQL query and its correspondingtranslated NL utterance. Advantageously, the TTS model 44612 can storethe SQL Grammar efficiently, match input ASTs efficiently, and produceNL utterances efficiently, produce a list of traces of tree transductionrules, and produce a list of possible NL utterances in forms of ann-best list and/or an exhaustive list. In some instances, the TTS model44162 can be the Travatar Translation Engine. Additional information forthe Travatar tree transducer is found in “Travatar: A Forest-to-StringMachine Translation Engine based on Tree Transducers” by Neubig,published In Proceedings of the 51st Annual Meeting of the Associationfor Computational Linguistics: System Demonstrations, the entirecontents of which are hereby incorporated by reference as if fully setforth herein.

Upon generating the generated training data 4420 at translation stage4414, the generated training data 4420 can be optionally paraphrased atparaphrasing step 4422 to produce paraphrased generated training data4424. Paraphrasing is described above and not repeated here.

In some instances, the generated training data 4420 and/or theparaphrased generated training data 4424 can be combined with thetraining data 4102 to produce updated training data 4426. In someinstances, the updated training data 4426 can be used to train one ormore natural language to logical form (NL-LF) algorithms such as theNL-LF algorithm(s) 4918 described below with respect to FIG. 4F.

Using the foregoing data manufacturing framework based on tree-to-stringtranslation, additional training data can be generated without the time,effort, and money required to gather and clean data under theconventional approaches. Additionally, using the foregoing datamanufacturing framework based on tree-to-string translation, models thatperform well on select databases and that can generalize well to new andunseen databases can be built.

Model System

FIG. 4F shows a block diagram illustrating aspects of a model system4900 configured to train and deploy machine learning models 4924 (e.g.,a natural language to logical form translator models that may be used bya digital assistant or chatbot as described with respect to FIGS. 1-3 ).The model system 4900 in this example includes various stages: atraining stage 4910 to train the machine learning models for translatingnatural language to a logical form such as SQL, a logical form inferencestage 4920 to translate natural language utterances into logical formqueries, and a query stage 4930 to execute logical form queries on asystem such as a relational database system.

To train the various machine learning models 4924, the training stage4910 is comprised of two main subsystems or services: dataset preparer4914 and model trainer 4916. The dataset preparer 4914 facilitates theprocess of loading data assets 4912, splitting the data assets 4912 intotraining and validation sets so that the system can train and test themachine learning models 4924, and performing basic natural languagepre-processing (e.g., standardization, normalization, tokenizing data,annotation, augmentation, embedding, etc.). The data assets 4912 includenatural language utterances (e.g., natural language questions/requests)and their corresponding logical forms (e.g., statements/queries such asSQL queries). In some instances, the data assets 4912 can be accessedfrom one or more sources such as a database (not shown), a computingsystem (e.g., data preprocessing subsystem), or the like. In someinstances, the data assets 4912 are provided by a client or customer. Insome instances, the data assets 4912 can be obtained using any of thedata manufacturing frameworks described above with respect to FIGS.4A-4E.

Once the data assets 4912 are obtained, the datasets may be split intotraining and validation datasets. The splitting may be performedrandomly (e.g., a 90/10% or 70/30%) or the splitting may be performed inaccordance with a more complex validation technique such as K-FoldCross-Validation, Leave-one-out Cross-Validation, Leave-one-group-outCross-Validation, Nested Cross-Validation, or the like to minimizesampling bias and overfitting. Before or after splitting, basic naturallanguage pre-processing may be performed on the data assets 4912. Insome instances, the pre-processing includes tokenizing the utterances.Tokenizing is splitting a phrase, sentence, paragraph, or an entire textdocument into smaller units, such as individual words or terms. Each ofthese smaller units are called tokens. Smaller units are created bylocating boundaries such as word boundaries, which are the ending pointof a word and the beginning of the next word. For example, the text “Howmany employees work for company X” can be word tokenized into ‘How’,‘many’, ‘employees’, ‘work’, ‘for’, ‘company’, ‘X’. These tokens helpthe model to understand the context and develop the model for a giventask. There are various tokenization techniques which can be used forexecuting the tokenizing based on the language and modeling task. Forexample, the tokenizing may be performed using Natural Language ToolKit,white space tokenization, dictionary-based tokenization, rule-basedtokenization, Keras tokenization, Penn Tree based tokenization, spaCytokenization, Moses tokenization, subword tokenization, or the like.

In some instances, the tokens for data assets 4912 may then be embeddedto word embeddings. A word embedding is a learned representation fortext where words that have the same meaning have a similarrepresentation. Word embeddings are generated by embedding techniqueswhere individual words are represented as real-valued vectors in apredefined vector space so they can be understood by deep learningalgorithms. The embedding techniques can be joint or individualembedding techniques such as including an embedding layer within thedeep learning algorithm or using a separate model such as Word2Vec orGloVe. An embedding layer is a word embedding that is learned jointlywith a neural network model on a specific natural language processingtask, such as the natural language to logical form translation (e.g.,the NL-LF algorithm(s) 4918). Word2Vec is a statistical technique thatuses a model such as Continuous Bag-of-Words or Continuous Skip-GramModel for learning a standalone word embedding from a text corpus.GloVe, for Global Vectors, is a model for creating word embeddings basedon the global corpus statistics. It is trained on the non-zero entriesof a global word-word co-occurrence matrix, which tabulates howfrequently words co-occur with one another in a given corpus.

The natural language-logical form (NL-LF) algorithm(s) 4918 are trainedby model trainer 4916 using the preprocessed data assets 4912 (e.g.,tokenized data assets). In some instances, the NL-LF algorithm(s) 4918comprise an encoder-decoder neural network. The encoder is comprised ofan input layer and one or more encoding layers. The one or more encodinglayers may include multiple recurrent units such as Long Short-TermMemory (LSTM), where each recurrent unit gets input in the form of asingle element of the input sequence, gathering data for that specificelement and generating it forward. The encoder follows an embeddingprocedure to transform the relevant text (and optionally the databaseschema) into number/vector representation to conserve the conditions andconnection between words and sentences, such that a machine cancomprehend the pattern associated with any text, make out the context ofthe sentences, and optionally learn relationships between words and agiven database schema. The result of the encoder will be a state vectoror context vector. This state vector will be the input for the decoder.The decoder is comprised of an input layer, one or more decoding layers,a dense layer, and an output layer (e.g., a layer with a softmaxfunction). The one or more decoding layers may include multiplerecurrent units such as LSTM in which an output for every time step ispredicted. The current recurrent unit accepts a hidden state from theearlier recurrent unit. The result of the decoder will be a logical formsuch as SQL query translated from an utterance within the preprocesseddata assets 4912. Examples of a trained model 4924 such as a NL-LF modelinclude, but are not limited to, RAT-SQL and DuoRAT. Additionalinformation for the RAT-SQL model is found in “RAT-SQL: Relation-AwareSchema Encoding and Linking for Text-to-SQL Parsers” by Wang et al.,published in Proceedings of the 58th Annual Meeting of the Associationfor Computational Linguistics, the entire contents of which are herebyincorporated by reference as if fully set forth herein. Additionalinformation for the DuoRAT model is found in “DuoRAT: Towards SimplerText-to-SQL Models” by Scholak et al., published in Proceedings of the2021 Conference of the North American Chapter of the Association forComputational Linguistics, the entire contents of which are herebyincorporated by reference as if fully set forth herein.

The model training includes selecting hyperparameters for the model 4924and using an optimization algorithm (e.g., a stochastic gradient descentalgorithm or a variant thereof such as batch gradient descent orminibatch gradient descent) to find the model parameters that correspondto the best fit between predicted and actual outputs. Thehyperparameters are settings that can be tuned or optimized to controlthe behavior of the model 4924. Most models explicitly definehyperparameters that control different aspects of the models such asmemory or cost of execution. However, additional hyperparameters may bedefined and optimized to adapt a model to a specific scenario. Forexample, the hyperparameters may include the number of hidden units of amodel, the learning rate of a model, the convolution kernel width, orthe number of kernels for a model.

During training, error is calculated as the difference between theactual output and the predicted output. The function that is used tocompute this error is known as an objective function (e.g., a lossfunction or a cost function). Error is a function of internal parametersof the model, e.g., weights and bias. For accurate predictions, theerror needs to be minimized. In order to minimize the error, the modelparameters are incrementally updated by minimizing the objectivefunction over the training examples from preprocessed data assets 4912.The objective function can be constructed to measure the differencebetween the outputs inferred using the models and the ground truthannotated to the samples using the labels. For example, for a supervisedlearning-based model, the goal of the training is to learn a function“h( )” (also sometimes referred to as the hypothesis function) that mapsthe training input space X to the target value space Y, h: X→Y, suchthat h(x) is a good predictor for the corresponding value of y. Variousdifferent techniques may be used to learn this hypothesis function. Insome machine learning algorithms such as a neural network, this is doneusing back propagation. The current error is typically propagatedbackwards to a previous layer, where it is used to modify the weightsand bias in such a way that the error is minimized. The weights aremodified using the optimization function. Optimization functions usuallycalculate the error gradient, i.e., the partial derivative of theobjective function with respect to weights, and the weights are modifiedin the opposite direction of the calculated error gradient. For example,techniques such as back propagation, random feedback, Direct FeedbackAlignment (DFA), Indirect Feedback Alignment (IFA), Hebbian learning,and the like are used update the model parameters in such a manner as tominimize or maximize this objective function. This cycle is repeateduntil the minima of the objective function is reached.

Once a set of model parameters are identified by the model trainer 4916,the model 4924 has been trained and a validator is configured tovalidate the model 4924 using the validation datasets. The validationprocess performed by the validator includes iterative operations ofinputting the validating datasets into the model 4924 using a validationtechnique such as K-Fold Cross-Validation, Leave-one-outCross-Validation, Leave-one-group-out Cross-Validation, NestedCross-Validation, or the like to tune the hyperparameters and ultimatelyfind the optimal set of hyperparameters. Once the optimal set ofhyperparameters are obtained, a reserved test set of data from thevalidating datasets are input into the model 4924 to obtain output, andthe output is evaluated versus ground truth values using correlationtechniques such as Bland-Altman method and the Spearman's rankcorrelation coefficients and calculating performance metrics such as theerror, accuracy, precision, recall, receiver operating characteristiccurve (ROC), etc. In some instances, the obtaining, training, andvalidating data processes in the model system 4900 can be repeatedlyperformed (adjusted) by the model trainer 4916 until a predeterminedcondition is satisfied and a set of model parameters can be provided bythe model trainer 4916.

As should be understood, other training/validation mechanisms arecontemplated and may be implemented within the model system 4900. Forexample, the model 4924 may be trained and hyperparameters may be tunedon datasets from the subset of obtained or filtered datasets and thedatasets from the subset of obtained or filtered datasets may only beused for testing and evaluating performance of the model 4924. Moreover,although the training mechanisms described herein focus on training anew model 4924. These training mechanisms can also be utilized to finetune existing models trained from other datasets. For example, in someinstances, a model 4924 might have been pre-trained using datasets fromdifferent modalities or tasks. In those cases, the models 4924 can beused for transfer learning and retrained/validated using the trainingand validating data.

The training stage 4910 outputs a trained model 4924 with an optimizedset of model parameters and hyperparameters for use in the inferencestage 4920. The inference stage 4920 comprises a predictor 4928 fortranslating natural language to a logical form. For example, thepredictor 4928 executes processes for inputting natural languageutterance(s) 4922 such as a non-follow-up utterance, one or morefollow-up utterances, or a combination thereof into the trained model4924, and generating, using the trained model 4924, a prediction for alogical form 4926 based on features within the natural languageutterance(s) 4922. The inference stage 4920 outputs a prediction for thelogical form 4926 for optional use in query stage 4930. The query stage4930 comprises one or more executors 4932 configured for executing thelogical form 4926 on a system such as database 4934 to obtain a result4936 (e.g., an answer to a query within utterances(s) 4922). Forexample, the one or more executors 4932 may be configured to execute aSQL query on a relational database to obtain an answer to a query posedin the natural language utterance(s) 4922.

While not explicitly shown, it will be appreciated that the model system4900 may further include a developer device associated with a developer.Communications from a developer device to components of the model system4900 may indicate what types of input data, utterances, and/or databaseschema are to be used for the models, a number and type of models to beused, hyperparameters of each model, for example, learning rate andnumber of hidden layers, how data requests are to be formatted, whichtraining data is to be used (e.g., and how to gain access to thetraining data) and which validation technique is to be used, and/or howthe controller processes are to be configured.

Illustrative Methods

FIG. 5A illustrates an example process 5100 for synthesizing synthetictraining data based on templates and a SCFG. The processing depicted inFIG. 5A may be implemented in software (e.g., code, instructions, aprogram) executed by one or more processing units (e.g., one or moreprocessors, cores) of the respective systems, hardware, or combinationsthereof described throughout. The software may be stored on anon-transitory storage medium (e.g., on a memory device). Although themethods presented in FIG. 5A depict the various processing stepsoccurring in a particular sequence or order, this is not intended to belimiting. In certain alternative embodiments, the steps may be performedin parallel and/or in a different order. In certain embodiments, such asin the embodiment depicted in FIGS. 1-4F, the processing depicted inFIG. 5A may be performed by a pre-processing subsystem (e.g.,pre-processing subsystem 210) and/or the model system 4900.

At block 5102, original training data is accessed. In some instances,the original training includes a plurality of utterances and a pluralityof logical forms (e.g., SQL queries) with each logical form of theplurality of logical forms corresponding to at least one utterance ofthe plurality of utterances. In some instances, the original trainingdata includes database schema information for one or more databases.

At block 5104, a plurality of templates is generated. In some instances,each template of the plurality of templates includes a delexicalizedversion of an utterance in the plurality of utterances and adelexicalized version of a logical form corresponding to the utterance.In some instances, the plurality of templates can be generatedautomatically from the utterances and their corresponding logical formsin the original training data using a trained machine learning model. Insome instances, the machine learning model is trained to performapproximate string matching. In some instances, the trained machinelearning model can predict which words in a respective utterancecorrespond to table names, table column names, and column values in thedatabase schema information and replace those words with thenon-terminal symbols. In some instances, the plurality of template canbe generated by a user based on a rules scheme.

At block 5106, a grammar is learned from the plurality of logical forms.In some instances, the grammar defines a plurality of production rulesfor lexicalizing the plurality of templates. In some instances, thegrammar is an SCFG. In some instances, the grammar can be learned basedon the utterances and database schema information included in theoriginal training data. In some instances, the grammar can be learned bysetting table names, column names, and values in the database schemainformation as non-terminal symbols, setting logical form operators suchas SQL operators (e.g., Max, Min, =, Like, etc.) as non-terminalsymbols, setting logical form functions such as SQL functions (e.g.,Ave., Count, First, Last, etc.) as non-terminal symbols, and generatingone or more production rules. In some instances, the one or moreproduction rules are generated by replacing one or more words, entities,or phrases in the utterances in the original training data with the setnon-terminal symbols.

At block 5108, synthetic training data is generated. In some instances,the synthetic training data is generated by parsing each template of theplurality of templates, sampling a database to identify a plurality ofsampling components, and a lexicalizing each template of the pluralityof templates with at least one sampling component of the plurality ofsampling components. In some instances, plurality of templates can beparsed using a parsing algorithm and the SCFG. In some instances, theparsing algorithm can apply the SCFG to each delexicalized utterance andits corresponding delexicalized logical form to generate an AST for eachparsed delexicalized utterance and its corresponding parseddelexicalized logical form in which their respective logical syntacticcomponents are identified and represented in the AST.

In some instances, in order to sample a database to identify a pluralityof sampling components, each template in the plurality of templates isanalyzed to identify one or more constraints in the respective template,a database is analyzed to identify its components, database componentsare sampled based on the identified one or more constraints in eachtemplate, and the non-terminal symbols in each delexicalized utteranceand its corresponding delexicalized logical form are replaced with thesampled components. In some instances, components are sampled from thedatabase based on a database analysis of the database and the analyzedtemplates. In some instances, the database is one or more relationaldatabases with each database having components (e.g., tables, columns,and values). In some instances, a database analysis is performed on theone or more databases to identify its components. In some instances,components are sampled from the database based on the analysis of thenon-terminal symbols in the analyzed templates. In some instances, eachparsed template of the plurality of parsed templates can be lexicalizedwith the sampled components for the respective parsed template toproduce lexicalized training examples (i.e., the synthetic trainingexamples). In some instances, a parsed utterance and its correspondingparsed logical form in parsed templates can be lexicalized by replacingthe non-terminal symbols of the utterance and its corresponding logicalform with the components of the selected database sampled for therespective parsed utterance and logical form.

In some instances, each lexicalized training example of lexicalizedtraining examples can be validated and lexicalized training examplesthat are valid can be included in the lexicalized training data (i.e.,the synthetic training data) and lexicalized training examples that arenot valid can be discarded (i.e., the discarded training examples). Insome instances, in order to validate each lexicalized training example,a constraint check can be performed on each lexicalized trainingexample. In some instances, the constraint check is performed byexecuting the respective lexicalized logical form against a database inthe one or more databases.

Upon generating the lexicalized training data (i.e., the synthetictraining data), the lexicalized training data can be paraphrased toproduce paraphrased lexicalized training data. In some instances, thelexicalized training data and/or the paraphrased lexicalized trainingdata can be combined with the original training data to produce updatedtraining data.

FIG. 5B illustrates an example process 5200 for synthesizing synthetictraining data based on a probabilistic context-free grammar and astatistical translator. The processing depicted in FIG. 5B may beimplemented in software (e.g., code, instructions, a program) executedby one or more processing units (e.g., one or more processors, cores) ofthe respective systems, hardware, or combinations thereof describedthroughout. The software may be stored on a non-transitory storagemedium (e.g., on a memory device). Although the methods presented inFIG. 5B depict the various processing steps occurring in a particularsequence or order, this is not intended to be limiting. In certainalternative embodiments, the steps may be performed in parallel and/orin a different order. In certain embodiments, such as in the embodimentdepicted in FIGS. 1-4F, the processing depicted in FIG. 5B may beperformed by a pre-processing subsystem (e.g., pre-processing subsystem210) and/or the model system 4900.

At block 5202, original training data is accessed. In some instances,the original training data includes a plurality of utterances and aplurality of logical forms (e.g., SQL queries) with each logical form ofthe plurality of logical forms corresponding to at least one utteranceof the plurality of utterances. In some instances, the original trainingdata includes database schema information for one or more databases.

At block 5204, a pre-trained model is obtained. In some instances, thepre-trained model is trained to translate utterances to logical forms.In some instances, the pre-trained model is a text-to-text transfertransformer.

At block 5206, the pre-trained model is finetuned. In some instances,the pre-trained model is finetuned to translate logical forms toutterances. In some instances, the finetuning is performed using theoriginal training data and generates a finetuned model. In someinstances, the pre-trained model can be finetuned using transferlearning. In some instances, weights and parameters of the pre-trainedmodel can be adjusted based on the original training data using one ormore machine learning optimization techniques (e.g., AdamW). In someinstances, using the pre-trained model, a loss/error is computed, theloss/error is used to compute gradients, and the gradients are used toupdate the model weights and biases of the pre-trained model.

At block 5208, a set of delexicalized logical forms is generated. Insome instances, the set of delexicalized logical forms includesdelexicalized versions of the plurality of logical forms. In someinstances, the delexicalized versions of the plurality of logical formscan be generated automatically from the utterances and theircorresponding logical forms queries in the original training data usinga trained machine learning model. In some instances, the machinelearning model is trained to perform approximate string matching. Insome instances, the trained machine learning model can predict whichwords in a respective utterance correspond to table names, table columnnames, and column values in the database schema information and replacethose words with the non-terminal symbols. In some instances, thedelexicalized versions of the plurality of logical forms can begenerated by a user based on a rules scheme. In some instances, the setof delexicalized logical forms includes delexicalized logical formsgenerated using a PCFG. In some instances, the delexicalized logicalforms generated using the PCFG can be generated by the delexicalizedversions of the plurality of logical forms into ASTs and using the PCFGto generate the additional delexicalized logical forms from the ASTs.

At block 5210, a set of lexicalized logical forms is generated. In someinstances, the set of lexicalized logical forms is generated bylexicalizing the set of delexicalized logical forms. In some instances,lexicalizing the set of delexicalized logical forms includes parsingeach delexicalized logical form in the set of delexicalized logicalforms, sampling a database to identify a plurality of samplingcomponents, and a lexicalizing each parsed delexicalized logical with atleast one sampling component of the plurality of sampling components. Insome instances, the set of delexicalized logical forms can be parsedusing a parsing algorithm and an SCFG. In some instances, the parsingalgorithm can apply the SCFG to each delexicalized logical form togenerate an AST for each parsed delexicalized logical form in whichtheir respective logical syntactic components are identified andrepresented in the AST.

In some instances, in order to sample a database to identify a pluralityof sampling components, each parsed delexicalized logical form isanalyzed to identify one or more constraints in the respective parseddelexicalized logical form, a database is analyzed to identify itscomponents, database components are sampled based on the identified oneor more constraints in each parsed delexicalized logical form, and thenon-terminal symbols in each parsed delexicalized logical form arereplaced with the sampled components. In some instances, components aresampled from the database based on a database analysis of the databaseand the analyzed parsed delexicalized logical forms. In some instances,the database is one or more relational databases with each databasehaving components (e.g., tables, columns, and values). In someinstances, a database analysis is performed on the one or more databasesto identify its components. In some instances, components are sampledfrom the database based on the analysis of the non-terminal symbols inthe analyzed parsed delexicalized logical forms. In some instances, eachparsed delexicalized logical form can be lexicalized with the sampledcomponents for the respective parsed delexicalized logical form toproduce a set of lexicalized logical forms. In some instances, a parseddelexicalized logical form can be lexicalized by replacing thenon-terminal symbols of the parsed delexicalized logical form with thecomponents of the selected database sampled for the respective parseddelexicalized logical form.

At block 5212, synthetic training data is generated. In some instances,the synthetic training data is generated by the finetuned model. In someinstances, the synthetic training data includes an utterance for eachlexicalized logical form of the set of lexicalized logical forms. Insome instances, using the finetuned model, set of lexicalized logicalforms can be translated into NL utterances to form a NL utterance setand the NL utterance set can be combined with the set of lexicalizedlogical forms to form generated training data (i.e., the synthetictraining data). In some instances, each NL utterance of the NL utteranceset can form a pair with the respective lexicalized logical form of theset of lexicalized logical forms used to translate the respective NLutterance. Upon generating the synthetic training data, the synthetictraining data can be paraphrased to produce paraphrased synthetictraining data. In some instances, the synthetic training data and/or theparaphrased synthetic training data can be combined with the originaltraining data to produce updated training data.

FIG. 5C illustrates an example process 5300 for synthesizing synthetictraining data based on tree-to-string translation. The processingdepicted in FIG. 5C may be implemented in software (e.g., code,instructions, a program) executed by one or more processing units (e.g.,one or more processors, cores) of the respective systems, hardware, orcombinations thereof described throughout. The software may be stored ona non-transitory storage medium (e.g., on a memory device). Although themethods presented in FIG. 5C depict the various processing stepsoccurring in a particular sequence or order, this is not intended to belimiting. In certain alternative embodiments, the steps may be performedin parallel and/or in a different order. In certain embodiments, such asin the embodiment depicted in FIGS. 1-4F, the processing depicted inFIG. 5C may be performed by a pre-processing subsystem (e.g.,pre-processing subsystem 210) and/or the model system 4900.

At block 5302, original training data is accessed. In some instances,the original training data includes a plurality of utterances and aplurality of logical forms with each logical form of the plurality oflogical forms corresponding to at least one utterance of the pluralityof utterances. In some instances, the original training data includesdatabase schema information for one or more databases.

At block 5304, a set of ASTs is generated for the plurality of logicalforms. In some instances, the set of ASTs is generated by parsing eachlogical form of the plurality of logical forms into an AST andnormalizing the respective AST. In some instances, in order to normalizethe ASTs, an AST of the ASTs including a node having more than twochildren can be binarized into such that each node of the respectiveASTs have no more than two children. In some instances, in order tonormalize the ASTs, a unary wrapping can be performed on each AST suchthat a unary head identifier is applied at each node of the respectiveAST that corresponds to a non-terminal. In some instances, in order tonormalize the ASTs, one or more deletion nodes will be added to arespective AST for each predefined clause of predefined clauses notpresent in the respective SQL query such that the AST will have therequested child node and possible combinations thereof. Examples of suchpredefined clauses include a WHERE clause, a GROUP clause, an ORDERBYclause, an IEU (intersect, except, union) clause, and the like.

At block 5306, a set of delexicalized logical forms is generated. Insome instances, each delexicalized logical form of the plurality ofdelexicalized logical forms is a delexicalized version of a logical formof the plurality of logical forms. In some instances, the set ofdelexicalized logical forms includes delexicalized versions of theplurality of logical forms. In some instances, the delexicalizedversions of the plurality of logical forms can be generatedautomatically from the utterances and their corresponding logical formsqueries in the original training data using a trained machine learningmodel. In some instances, the machine learning model is trained toperform approximate string matching. In some instances, the trainedmachine learning model can predict which words in a respective utterancecorrespond to table names, table column names, and column values in thedatabase schema information and replace those words with thenon-terminal symbols.

At block 5308, a set of lexicalized logical forms is generated. In someinstances, the set of lexicalized logical forms is generated bylexicalizing the set of delexicalized logical forms. In some instances,lexicalizing the set of delexicalized logical forms includes parsingeach delexicalized logical form in the set of delexicalized logicalforms, sampling a database to identify a plurality of samplingcomponents, and a lexicalizing each parsed delexicalized logical with atleast one sampling component of the plurality of sampling components. Insome instances, the set of delexicalized logical forms can be parsedusing a parsing algorithm and an SCFG. In some instances, the parsingalgorithm can apply the SCFG to each delexicalized logical form togenerate an AST for each parsed delexicalized logical form in whichtheir respective logical syntactic components are identified andrepresented in the AST.

In some instances, in order to sample a database to identify a pluralityof sampling components, each parsed delexicalized logical form isanalyzed to identify one or more constraints in the respective parseddelexicalized logical form, a database is analyzed to identify itscomponents, database components are sampled based on the identified oneor more constraints in each parsed delexicalized logical form, and thenon-terminal symbols in each parsed delexicalized logical form arereplaced with the sampled components. In some instances, components aresampled from the database based on a database analysis of the databaseand the analyzed parsed delexicalized logical forms. In some instances,the database is one or more relational databases with each databasehaving components (e.g., tables, columns, and values). In someinstances, a database analysis is performed on the one or more databasesto identify its components. In some instances, components are sampledfrom the database based on the analysis of the non-terminal symbols inthe analyzed parsed delexicalized logical forms. In some instances, eachparsed delexicalized logical form can be lexicalized with the sampledcomponents for the respective parsed delexicalized logical form toproduce a set of lexicalized logical forms. In some instances, a parseddelexicalized logical form can be lexicalized by replacing thenon-terminal symbols of the parsed delexicalized logical form with thecomponents of the selected database sampled for the respective parseddelexicalized logical form.

At block 5310, synthetic training data is generated. In some instances,the synthetic training data is generated using a TTS model. In someinstances, the synthetic training data includes an utterance for eachlexicalized logical form of the set of lexicalized logical forms. Insome instances, the TTS model generates the NL utterances based on theset of lexicalized logical forms, the set of ASTs, and a SQL grammar. Insome instances, SQL Grammar can include rules for tree transduction thatdefine rules of source subtree transformations while also defining rulesfor synchronously generating output strings. In some instances, the SQLGrammar can include a plurality of SCFG rules, a plurality of treetransduction rules, and a plurality of utterances that correspond to theplurality of SCFG rules and the plurality of tree transduction rules. Insome instances, the plurality of SCFG rules define the grammaticalstructure of the SQL queries in the lexicalized SQL query set. In someinstances, each tree transduction rule of the plurality of treetransduction rules is a binarized version of a respective SCFG rule ofthe plurality SCFG rules and defines how a source tree (e.g., an AST)can be transformed into a target tree (e.g., an utterance tree). In someinstances, each utterance of the plurality of utterances corresponds toa respective SCFG rule of the plurality of SCFG rules and a respectivetree transduction rule of the plurality of tree transduction rules. Insome instances, the plurality of SCFG rules is grouped by their head(i.e., the symbol on the left of the arrow) where each SCFG rule has amatching source and target. In some instances, a rule may include anon-terminal. In some instances, the non-terminal can be a variable thatis be rewritten for the utterance based on a respective treetransduction rule of the plurality of learned tree transduction rules.In some instances, the plurality of SCFG rules can be learned from thetraining data.

In some instances, the TTS model generates an NL utterance for eachlexicalized SQL query of lexicalized SQL set by (a) reordering nodes ofthe AST for the respective lexicalized SQL query based on the pluralityof SCFG rules and the plurality of tree transduction rules into areordered AST for the respective lexicalized SQL query and (b) decodingthe reordered AST into an NL utterance for the respective lexicalizedSQL query based on the respective lexicalized SQL query. In someinstances, in order to reorder the nodes of an AST, each SCFG rule ofthe plurality of SCFG rules can be applied to each child node of therespective AST to identify which nodes correspond to the non-terminalsof plurality of SCFG rules and each tree transduction rule of theplurality of tree transduction rules can be applied to each identifiednode to reorder the nodes to match the structure defined by theplurality of tree transduction rules. For example, an AST for a SQLquery can be arranged with a SELECT node representing a SELECT operatorin the SQL query in a first layer of the AST, a COLS node representingcolumn(s) in the SQL query in a leftmost branch of a second layer of theAST, a FROM node representing a FROM operator in the SQL query in amiddle branch of the second layer of the AST, and a WHERE noderepresenting a WHERE operator in the SQL query in a rightmost branch ofthe second layer of the AST. The AST can further include a third layerfor each node in the second layer and each node in the third layer canrepresent one or more targets of the nodes of the second layer. Forexample, the COLS node can include a child node in the third layer thatrepresents the column referenced in the SQL query, the FROM node caninclude a child node in the third layer that represents the tablereferenced in the SQL query, and the WHERE node can include a child nodein the third layer that represents a value in the column referenced inthe SQL query.

The AST can be reordered based on SCFG rules and tree transductionrules. In order words, a new AST can be generated in which one or morechild nodes of an existing branch can be moved to another branch and/ornew branch and/or an order of the branches can be rearranged (e.g., aleftmost branch becomes a rightmost branch). In some instances, the ASTcan be reordered by finding all the transduction rules that can apply toan existing tree. An SCFG rule and corresponding tree transduction ruleapplies when the head of the SCFG rule matches a node in the AST and thestate of the node matches the state of the SCFG rule. In some instances,when a portion of the AST matches an SCFG rule, the corresponding treetransduction rule is applied, and a new AST subtree is generated andadded to the AST by replacing the branch (or subtree) that matched theSCFG rule with the right-hand side of the SCFG rule as structured by thetree transduction rule. A complete tree transduction begins with theroot node being in the initial state (e.g., the SELECT node) thencontinues with states beneath the root node by propagating down the treeto the leaves until the entire tree has been transduced. In someinstances, the TTS model can reorder the AST prior to decoding thereordered AST. In other instances, the AST can be reordered and decodedconcurrently. In some instances, the TTS model can be trained with adataset that includes SQL queries, translated NL utterances for thoseSQL queries, a tree transduction rule set, and an index that associatesone or more tree transduction rules of the tree transduction rule setwith each SQL query and its corresponding translated NL utterance.

In some instances, the generated NL utterances can be combined with theset of lexicalized logical forms to form generated training data (i.e.,the synthetic training data). In some instances, each generated NLutterance can form a pair with the respective lexicalized logical formof the set of lexicalized logical forms that was used by the TTS modelto generate the respective NL utterance. Upon generating the synthetictraining data, the synthetic training data can be paraphrased to produceparaphrased synthetic training data. In some instances, the synthetictraining data and/or the paraphrased synthetic training data can becombined with the original training data to produce updated trainingdata.

FIG. 5D illustrates an example process 5400 for transforming naturallanguage to SQL. The processing depicted in FIG. 5D may be implementedin software (e.g., code, instructions, a program) executed by one ormore processing units (e.g., one or more processors, cores) of therespective systems, hardware, or combinations thereof describedthroughout. The software may be stored on a non-transitory storagemedium (e.g., on a memory device). Although the methods presented inFIG. 5D depict the various processing steps occurring in a particularsequence or order, this is not intended to be limiting. In certainalternative embodiments, the steps may be performed in parallel and/orin a different order. In certain embodiments, such as in the embodimentdepicted in FIGS. 1-4F, the processing depicted in FIG. 5D may beperformed by a pre-processing subsystem (e.g., pre-processing subsystem210) and/or the model system 4900.

At block 5402, a machine learning model is trained with the originaltraining data and the synthetic training data to translate an utteranceto a logical form. In some instances, the logical form can be a SQLquery.

At block 5404, an utterance is accessed. In some instances, theutterance corresponds to a natural language statement, query and/orquestion. In some instances, the utterance is obtained from one or moresources such as a database (not shown), a computing system (e.g., datapreprocessing subsystem), a user, or the like. In some instances, theuser is a user interacting with the digital assistant, as describedherein with respect to FIGS. 1-3 .

At block 5406, the utterance is input into the trained machine learningmodel.

At block 5408, the utterance is translated into a logical form using thetrained machine learning model. In some instances, the logical form is aSQL query.

At block 5410, the logical form is executed as a query on a database toretrieve a result for the query.

At block 5412, the result is output for the natural language utterance.

As used herein, when an action is “based on” something, this means theaction is based at least in part on at least a part of the something. Asused herein, the terms “substantially,” “approximately” and “about” aredefined as being largely but not necessarily wholly what is specified(and include wholly what is specified) as understood by one of ordinaryskill in the art. In any disclosed embodiment, the term “substantially,”“approximately,” or “about” may be substituted with “within [apercentage] of” what is specified, where the percentage includes 0.1, 1,5, and 10 percent.

Illustrative Systems

FIG. 6 depicts a simplified diagram of a distributed system 600. In theillustrated example, distributed system 600 includes one or more clientcomputing devices 602, 604, 606, and 608, coupled to a server 612 viaone or more communication networks 610. Clients computing devices 602,604, 606, and 608 may be configured to execute one or more applications.

In various examples, server 612 may be adapted to run one or moreservices or software applications that enable one or more embodimentsdescribed in this disclosure. In certain examples, server 612 may alsoprovide other services or software applications that may includenon-virtual and virtual environments. In some examples, these servicesmay be offered as web-based or cloud services, such as under a Softwareas a Service (SaaS) model to the users of client computing devices 602,604, 606, and/or 608. Users operating client computing devices 602, 604,606, and/or 608 may in turn utilize one or more client applications tointeract with server 612 to utilize the services provided by thesecomponents.

In the configuration depicted in FIG. 6 , server 612 may include one ormore components 618, 620 and 622 that implement the functions performedby server 612. These components may include software components that maybe executed by one or more processors, hardware components, orcombinations thereof. It should be appreciated that various differentsystem configurations are possible, which may be different fromdistributed system 600. The example shown in FIG. 6 is thus one exampleof a distributed system for implementing an example system and is notintended to be limiting.

Users may use client computing devices 602, 604, 606, and/or 608 toexecute one or more applications, models or chatbots, which may generateone or more events or models that may then be implemented or serviced inaccordance with the teachings of this disclosure. A client device mayprovide an interface that enables a user of the client device tointeract with the client device. The client device may also outputinformation to the user via this interface. Although FIG. 6 depicts onlyfour client computing devices, any number of client computing devicesmay be supported.

The client devices may include various types of computing systems suchas portable handheld devices, general purpose computers such as personalcomputers and laptops, workstation computers, wearable devices, gamingsystems, thin clients, various messaging devices, sensors, or othersensing devices, and/or the like. These computing devices may runvarious types and versions of software applications and operatingsystems (e.g., Microsoft Windows®, Apple Macintosh®, UNIX® or UNIX-likeoperating systems, Linux or Linux-like operating systems such as GoogleChrome™ OS) including various mobile operating systems (e.g., MicrosoftWindows Mobile®, iOS®, Windows Phone®, Android™, BlackBerry®, Palm OS®).Portable handheld devices may include cellular phones, smartphones,(e.g., an iPhone), tablets (e.g., iPad®), personal digital assistants(PDAs), and the like. Wearable devices may include Google Glass® headmounted display, and other devices. Gaming systems may include varioushandheld gaming devices, Internet-enabled gaming devices (e.g., aMicrosoft Xbox® gaming console with or without a Kinect® gesture inputdevice, Sony PlayStation® system, various gaming systems provided byNintendo®, and others), and the like. The client devices may be capableof executing various different applications such as variousInternet-related apps, communication applications (e.g., E-mailapplications, short message service (SMS) applications) and may usevarious communication protocols.

Network(s) 610 may be any type of network familiar to those skilled inthe art that may support data communications using any of a variety ofavailable protocols, including without limitation TCP/IP (transmissioncontrol protocol/Internet protocol), SNA (systems network architecture),IPX (Internet packet exchange), AppleTalk®, and the like. Merely by wayof example, network(s) 610 may be a local area network (LAN), networksbased on Ethernet, Token-Ring, a wide-area network (WAN), the Internet,a virtual network, a virtual private network (VPN), an intranet, anextranet, a public switched telephone network (PSTN), an infra-rednetwork, a wireless network (e.g., a network operating under any of theInstitute of Electrical and Electronics (IEEE) 1002.11 suite ofprotocols, Bluetooth®, and/or any other wireless protocol), and/or anycombination of these and/or other networks.

Server 612 may be composed of one or more general purpose computers,specialized server computers (including, by way of example, PC (personalcomputer) servers, UNIX® servers, mid-range servers, mainframecomputers, rack-mounted servers, etc.), server farms, server clusters,or any other appropriate arrangement and/or combination. Server 612 mayinclude one or more virtual machines running virtual operating systems,or other computing architectures involving virtualization such as one ormore flexible pools of logical storage devices that may be virtualizedto maintain virtual storage devices for the server. In various examples,server 612 may be adapted to run one or more services or softwareapplications that provide the functionality described in the foregoingdisclosure.

The computing systems in server 612 may run one or more operatingsystems including any of those discussed above, as well as anycommercially available server operating system. Server 612 may also runany of a variety of additional server applications and/or mid-tierapplications, including HTTP (hypertext transport protocol) servers, FTP(file transfer protocol) servers, CGI (common gateway interface)servers, JAVA® servers, database servers, and the like. Exemplarydatabase servers include without limitation those commercially availablefrom Oracle®, Microsoft®, Sybase®, IBM® (International BusinessMachines), and the like.

In some implementations, server 612 may include one or more applicationsto analyze and consolidate data feeds and/or event updates received fromusers of client computing devices 602, 604, 606, and 608. As an example,data feeds and/or event updates may include, but are not limited to,Twitter® feeds, Facebook® updates or real-time updates received from oneor more third party information sources and continuous data streams,which may include real-time events related to sensor data applications,financial tickers, network performance measuring tools (e.g., networkmonitoring and traffic management applications), clickstream analysistools, automobile traffic monitoring, and the like. Server 612 may alsoinclude one or more applications to display the data feeds and/orreal-time events via one or more display devices of client computingdevices 602, 604, 606, and 608.

Distributed system 600 may also include one or more data repositories614, 616. These data repositories may be used to store data and otherinformation in certain examples. For example, one or more of the datarepositories 614, 616 may be used to store information such asinformation related to chatbot performance or generated models for useby chatbots used by server 612 when performing various functions inaccordance with various embodiments. Data repositories 614, 616 mayreside in a variety of locations. For example, a data repository used byserver 612 may be local to server 612 or may be remote from server 612and in communication with server 612 via a network-based or dedicatedconnection. Data repositories 614, 616 may be of different types. Incertain examples, a data repository used by server 612 may be adatabase, for example, a relational database, such as databases providedby Oracle Corporation® and other vendors. One or more of these databasesmay be adapted to enable storage, update, and retrieval of data to andfrom the database in response to SQL-formatted commands.

In certain examples, one or more of data repositories 614, 616 may alsobe used by applications to store application data. The data repositoriesused by applications may be of different types such as, for example, akey-value store repository, an object store repository, or a generalstorage repository supported by a file system.

In certain examples, the functionalities described in this disclosuremay be offered as services via a cloud environment. FIG. 7 is asimplified block diagram of a cloud-based system environment in whichvarious services may be offered as cloud services in accordance withcertain examples. In the example depicted in FIG. 7 , cloudinfrastructure system 702 may provide one or more cloud services thatmay be requested by users using one or more client computing devices704, 706, and 708. Cloud infrastructure system 702 may comprise one ormore computers and/or servers that may include those described above forserver 612. The computers in cloud infrastructure system 702 may beorganized as general-purpose computers, specialized server computers,server farms, server clusters, or any other appropriate arrangementand/or combination.

Network(s) 710 may facilitate communication and exchange of data betweenclients 704, 706, and 708 and cloud infrastructure system 702.Network(s) 710 may include one or more networks. The networks may be ofthe same or different types. Network(s) 710 may support one or morecommunication protocols, including wired and/or wireless protocols, forfacilitating the communications.

The example depicted in FIG. 7 is only one example of a cloudinfrastructure system and is not intended to be limiting. It should beappreciated that, in some other examples, cloud infrastructure system702 may have more or fewer components than those depicted in FIG. 7 ,may combine two or more components, or may have a differentconfiguration or arrangement of components. For example, although FIG. 7depicts three client computing devices, any number of client computingdevices may be supported in alternative examples.

The term cloud service is generally used to refer to a service that ismade available to users on demand and via a communication network suchas the Internet by systems (e.g., cloud infrastructure system 702) of aservice provider. Typically, in a public cloud environment, servers andsystems that make up the cloud service provider's system are differentfrom the customer's own on-premise servers and systems. The cloudservice provider's systems are managed by the cloud service provider.Customers may thus avail themselves of cloud services provided by acloud service provider without having to purchase separate licenses,support, or hardware and software resources for the services. Forexample, a cloud service provider's system may host an application, anda user may, via the Internet, on demand, order and use the applicationwithout the user having to buy infrastructure resources for executingthe application. Cloud services are designed to provide easy, scalableaccess to applications, resources, and services. Several providers offercloud services. For example, several cloud services are offered byOracle Corporation® of Redwood Shores, Calif., such as middlewareservices, database services, Java cloud services, and others.

In certain examples, cloud infrastructure system 702 may provide one ormore cloud services using different models such as under a Software as aService (SaaS) model, a Platform as a Service (PaaS) model, anInfrastructure as a Service (IaaS) model, and others, including hybridservice models. Cloud infrastructure system 702 may include a suite ofapplications, middleware, databases, and other resources that enableprovision of the various cloud services.

A SaaS model enables an application or software to be delivered to acustomer over a communication network like the Internet, as a service,without the customer having to buy the hardware or software for theunderlying application. For example, a SaaS model may be used to providecustomers access to on-demand applications that are hosted by cloudinfrastructure system 702. Examples of SaaS services provided by OracleCorporation® include, without limitation, various services for humanresources/capital management, customer relationship management (CRM),enterprise resource planning (ERP), supply chain management (SCM),enterprise performance management (EPM), analytics services, socialapplications, and others.

An IaaS model is generally used to provide infrastructure resources(e.g., servers, storage, hardware and networking resources) to acustomer as a cloud service to provide elastic compute and storagecapabilities. Various IaaS services are provided by Oracle Corporation®.

A PaaS model is generally used to provide, as a service, platform andenvironment resources that enable customers to develop, run, and manageapplications and services without the customer having to procure, build,or maintain such resources. Examples of PaaS services provided by OracleCorporation® include, without limitation, Oracle Java Cloud Service(JCS), Oracle Database Cloud Service (DBCS), data management cloudservice, various application development solutions services, and others.

Cloud services are generally provided on an on-demand self-servicebasis, subscription-based, elastically scalable, reliable, highlyavailable, and secure manner. For example, a customer, via asubscription order, may order one or more services provided by cloudinfrastructure system 702. Cloud infrastructure system 702 then performsprocessing to provide the services requested in the customer'ssubscription order. For example, a user may use utterances to requestthe cloud infrastructure system to take a certain action (e.g., anintent), as described above, and/or provide services for a chatbotsystem as described herein. Cloud infrastructure system 702 may beconfigured to provide one or even multiple cloud services.

Cloud infrastructure system 702 may provide the cloud services viadifferent deployment models. In a public cloud model, cloudinfrastructure system 702 may be owned by a third party cloud servicesprovider and the cloud services are offered to any general publiccustomer, where the customer may be an individual or an enterprise. Incertain other examples, under a private cloud model, cloudinfrastructure system 702 may be operated within an organization (e.g.,within an enterprise organization) and services provided to customersthat are within the organization. For example, the customers may bevarious departments of an enterprise such as the Human Resourcesdepartment, the Payroll department, etc. or even individuals within theenterprise. In certain other examples, under a community cloud model,the cloud infrastructure system 702 and the services provided may beshared by several organizations in a related community. Various othermodels such as hybrids of the above mentioned models may also be used.

Client computing devices 704, 706, and 708 may be of different types(such as client computing devices 602, 604, 606, and 608 depicted inFIG. 6 ) and may be capable of operating one or more clientapplications. A user may use a client device to interact with cloudinfrastructure system 702, such as to request a service provided bycloud infrastructure system 702. For example, a user may use a clientdevice to request information or action from a chatbot as described inthis disclosure.

In some examples, the processing performed by cloud infrastructuresystem 702 for providing services may involve model training anddeployment. This analysis may involve using, analyzing, and manipulatingdata sets to train and deploy one or more models. This analysis may beperformed by one or more processors, possibly processing the data inparallel, performing simulations using the data, and the like. Forexample, big data analysis may be performed by cloud infrastructuresystem 702 for generating and training one or more models for a chatbotsystem. The data used for this analysis may include structured data(e.g., data stored in a database or structured according to a structuredmodel) and/or unstructured data (e.g., data blobs (binary largeobjects)).

As depicted in the example in FIG. 7 , cloud infrastructure system 702may include infrastructure resources 730 that are utilized forfacilitating the provision of various cloud services offered by cloudinfrastructure system 702. Infrastructure resources 730 may include, forexample, processing resources, storage or memory resources, networkingresources, and the like. In certain examples, the storage virtualmachines that are available for servicing storage requested fromapplications may be part of cloud infrastructure system 702. In otherexamples, the storage virtual machines may be part of different systems.

In certain examples, to facilitate efficient provisioning of theseresources for supporting the various cloud services provided by cloudinfrastructure system 702 for different customers, the resources may bebundled into sets of resources or resource modules (also referred to as“pods”). Each resource module or pod may comprise a pre-integrated andoptimized combination of resources of one or more types. In certainexamples, different pods may be pre-provisioned for different types ofcloud services. For example, a first set of pods may be provisioned fora database service, a second set of pods, which may include a differentcombination of resources than a pod in the first set of pods, may beprovisioned for Java service, and the like. For some services, theresources allocated for provisioning the services may be shared betweenthe services.

Cloud infrastructure system 702 may itself internally use services 732that are shared by different components of cloud infrastructure system702 and which facilitate the provisioning of services by cloudinfrastructure system 702. These internal shared services may include,without limitation, a security and identity service, an integrationservice, an enterprise repository service, an enterprise managerservice, a virus scanning and whitelist service, a high availability,backup and recovery service, service for enabling cloud support, anemail service, a notification service, a file transfer service, and thelike.

Cloud infrastructure system 702 may comprise multiple subsystems. Thesesubsystems may be implemented in software, or hardware, or combinationsthereof. As depicted in FIG. 7 , the subsystems may include a userinterface subsystem 712 that enables users or customers of cloudinfrastructure system 702 to interact with cloud infrastructure system702. User interface subsystem 712 may include various differentinterfaces such as a web interface 714, an online store interface 716where cloud services provided by cloud infrastructure system 702 areadvertised and are purchasable by a consumer, and other interfaces 718.For example, a customer may, using a client device, request (servicerequest 734) one or more services provided by cloud infrastructuresystem 702 using one or more of interfaces 714, 716, and 718. Forexample, a customer may access the online store, browse cloud servicesoffered by cloud infrastructure system 702, and place a subscriptionorder for one or more services offered by cloud infrastructure system702 that the customer wishes to subscribe to. The service request mayinclude information identifying the customer and one or more servicesthat the customer desires to subscribe to. For example, a customer mayplace a subscription order for a service offered by cloud infrastructuresystem 702. As part of the order, the customer may provide informationidentifying a chatbot system for which the service is to be provided andoptionally one or more credentials for the chatbot system.

In certain examples, such as the example depicted in FIG. 7 , cloudinfrastructure system 702 may comprise an order management subsystem(OMS) 720 that is configured to process the new order. As part of thisprocessing, OMS 720 may be configured to: create an account for thecustomer, if not done already; receive billing and/or accountinginformation from the customer that is to be used for billing thecustomer for providing the requested service to the customer; verify thecustomer information; upon verification, book the order for thecustomer; and orchestrate various workflows to prepare the order forprovisioning.

Once properly validated, OMS 720 may then invoke the order provisioningsubsystem (OPS) 724 that is configured to provision resources for theorder including processing, memory, and networking resources. Theprovisioning may include allocating resources for the order andconfiguring the resources to facilitate the service requested by thecustomer order. The manner in which resources are provisioned for anorder and the type of the provisioned resources may depend upon the typeof cloud service that has been ordered by the customer. For example,according to one workflow, OPS 724 may be configured to determine theparticular cloud service being requested and identify a number of podsthat may have been pre-configured for that particular cloud service. Thenumber of pods that are allocated for an order may depend upon thesize/amount/level/scope of the requested service. For example, thenumber of pods to be allocated may be determined based upon the numberof users to be supported by the service, the duration of time for whichthe service is being requested, and the like. The allocated pods maythen be customized for the particular requesting customer for providingthe requested service.

In certain examples, setup phase processing, as described above, may beperformed by cloud infrastructure system 702 as part of the provisioningprocess. Cloud infrastructure system 702 may generate an application IDand select a storage virtual machine for an application from amongstorage virtual machines provided by cloud infrastructure system 702itself or from storage virtual machines provided by other systems otherthan cloud infrastructure system 702.

Cloud infrastructure system 702 may send a response or notification 744to the requesting customer to indicate when the requested service is nowready for use. In some instances, information (e.g., a link) may be sentto the customer that enables the customer to start using and availingthe benefits of the requested services. In certain examples, for acustomer requesting the service, the response may include a chatbotsystem ID generated by cloud infrastructure system 702 and informationidentifying a chatbot system selected by cloud infrastructure system 702for the chatbot system corresponding to the chatbot system ID.

Cloud infrastructure system 702 may provide services to multiplecustomers. For each customer, cloud infrastructure system 702 isresponsible for managing information related to one or more subscriptionorders received from the customer, maintaining customer data related tothe orders, and providing the requested services to the customer. Cloudinfrastructure system 702 may also collect usage statistics regarding acustomer's use of subscribed services. For example, statistics may becollected for the amount of storage used, the amount of datatransferred, the number of users, and the amount of system up time andsystem down time, and the like. This usage information may be used tobill the customer. Billing may be done, for example, on a monthly cycle.

Cloud infrastructure system 702 may provide services to multiplecustomers in parallel. Cloud infrastructure system 702 may storeinformation for these customers, including possibly proprietaryinformation. In certain examples, cloud infrastructure system 702comprises an identity management subsystem (IMS) 728 that is configuredto manage customer information and provide the separation of the managedinformation such that information related to one customer is notaccessible by another customer. IMS 728 may be configured to providevarious security-related services such as identity services, such asinformation access management, authentication and authorizationservices, services for managing customer identities and roles andrelated capabilities, and the like.

FIG. 8 illustrates an example of computer system 800. In some examples,computer system 800 may be used to implement any of the digitalassistant or chatbot systems within a distributed environment, andvarious servers and computer systems described above. As shown in FIG. 8, computer system 800 includes various subsystems including a processingsubsystem 804 that communicates with a number of other subsystems via abus subsystem 802. These other subsystems may include a processingacceleration unit 806, an I/O subsystem 808, a storage subsystem 818,and a communications subsystem 824. Storage subsystem 818 may includenon-transitory computer-readable storage media including storage media822 and a system memory 810.

Bus subsystem 802 provides a mechanism for letting the variouscomponents and subsystems of computer system 800 communicate with eachother as intended. Although bus subsystem 802 is shown schematically asa single bus, alternative examples of the bus subsystem may utilizemultiple buses. Bus subsystem 802 may be any of several types of busstructures including a memory bus or memory controller, a peripheralbus, a local bus using any of a variety of bus architectures, and thelike. For example, such architectures may include an Industry StandardArchitecture (ISA) bus, Micro Channel Architecture (MCA) bus, EnhancedISA (EISA) bus, Video Electronics Standards Association (VESA) localbus, and Peripheral Component Interconnect (PCI) bus, which may beimplemented as a Mezzanine bus manufactured to the IEEE P1386.1standard, and the like.

Processing subsystem 804 controls the operation of computer system 800and may comprise one or more processors, application specific integratedcircuits (ASICs), or field programmable gate arrays (FPGAs). Theprocessors may include be single core or multicore processors. Theprocessing resources of computer system 800 may be organized into one ormore processing units 832, 834, etc. A processing unit may include oneor more processors, one or more cores from the same or differentprocessors, a combination of cores and processors, or other combinationsof cores and processors. In some examples, processing subsystem 804 mayinclude one or more special purpose co-processors such as graphicsprocessors, digital signal processors (DSPs), or the like. In someexamples, some or all of the processing units of processing subsystem804 may be implemented using customized circuits, such as applicationspecific integrated circuits (ASICs), or field programmable gate arrays(FPGAs).

In some examples, the processing units in processing subsystem 804 mayexecute instructions stored in system memory 810 or on computer readablestorage media 822. In various examples, the processing units may executea variety of programs or code instructions and may maintain multipleconcurrently executing programs or processes. At any given time, some,or all of the program code to be executed may be resident in systemmemory 810 and/or on computer-readable storage media 822 includingpotentially on one or more storage devices. Through suitableprogramming, processing subsystem 804 may provide variousfunctionalities described above. In instances where computer system 800is executing one or more virtual machines, one or more processing unitsmay be allocated to each virtual machine.

In certain examples, a processing acceleration unit 806 may optionallybe provided for performing customized processing or for off-loading someof the processing performed by processing subsystem 804 so as toaccelerate the overall processing performed by computer system 800.

I/O subsystem 808 may include devices and mechanisms for inputtinginformation to computer system 800 and/or for outputting informationfrom or via computer system 800. In general, use of the term inputdevice is intended to include all possible types of devices andmechanisms for inputting information to computer system 800. Userinterface input devices may include, for example, a keyboard, pointingdevices such as a mouse or trackball, a touchpad or touch screenincorporated into a display, a scroll wheel, a click wheel, a dial, abutton, a switch, a keypad, audio input devices with voice commandrecognition systems, microphones, and other types of input devices. Userinterface input devices may also include motion sensing and/or gesturerecognition devices such as the Microsoft Kinect® motion sensor thatenables users to control and interact with an input device, theMicrosoft Xbox® 360 game controller, devices that provide an interfacefor receiving input using gestures and spoken commands. User interfaceinput devices may also include eye gesture recognition devices such asthe Google Glass® blink detector that detects eye activity (e.g.,“blinking” while taking pictures and/or making a menu selection) fromusers and transforms the eye gestures as inputs to an input device(e.g., Google Glass®). Additionally, user interface input devices mayinclude voice recognition sensing devices that enable users to interactwith voice recognition systems (e.g., Siri® navigator) through voicecommands.

Other examples of user interface input devices include, withoutlimitation, three dimensional (3D) mice, joysticks or pointing sticks,gamepads and graphic tablets, and audio/visual devices such as speakers,digital cameras, digital camcorders, portable media players, webcams,image scanners, fingerprint scanners, barcode reader 3D scanners, 3Dprinters, laser rangefinders, and eye gaze tracking devices.Additionally, user interface input devices may include, for example,medical imaging input devices such as computed tomography, magneticresonance imaging, position emission tomography, and medicalultrasonography devices. User interface input devices may also include,for example, audio input devices such as MIDI keyboards, digital musicalinstruments and the like.

In general, use of the term output device is intended to include allpossible types of devices and mechanisms for outputting information fromcomputer system 800 to a user or other computer. User interface outputdevices may include a display subsystem, indicator lights, or non-visualdisplays such as audio output devices, etc. The display subsystem may bea cathode ray tube (CRT), a flat-panel device, such as that using aliquid crystal display (LCD) or plasma display, a projection device, atouch screen, and the like. For example, user interface output devicesmay include, without limitation, a variety of display devices thatvisually convey text, graphics, and audio/video information such asmonitors, printers, speakers, headphones, automotive navigation systems,plotters, voice output devices, and modems.

Storage subsystem 818 provides a repository or data store for storinginformation and data that is used by computer system 800. Storagesubsystem 818 provides a tangible non-transitory computer-readablestorage medium for storing the basic programming and data constructsthat provide the functionality of some examples. Storage subsystem 818may store software (e.g., programs, code modules, instructions) thatwhen executed by processing subsystem 804 provides the functionalitydescribed above. The software may be executed by one or more processingunits of processing subsystem 804. Storage subsystem 818 may alsoprovide authentication in accordance with the teachings of thisdisclosure.

Storage subsystem 818 may include one or more non-transitory memorydevices, including volatile and non-volatile memory devices. As shown inFIG. 8 , storage subsystem 818 includes a system memory 810 and acomputer-readable storage media 822. System memory 810 may include anumber of memories including a volatile main random access memory (RAM)for storage of instructions and data during program execution and anon-volatile read only memory (ROM) or flash memory in which fixedinstructions are stored. In some implementations, a basic input/outputsystem (BIOS), containing the basic routines that help to transferinformation between elements within computer system 800, such as duringstart-up, may typically be stored in the ROM. The RAM typically containsdata and/or program modules that are presently being operated andexecuted by processing subsystem 804. In some implementations, systemmemory 810 may include multiple different types of memory, such asstatic random access memory (SRAM), dynamic random access memory (DRAM),and the like.

By way of example, and not limitation, as depicted in FIG. 8 , systemmemory 810 may load application programs 812 that are being executed,which may include various applications such as Web browsers, mid-tierapplications, relational database management systems (RDBMS), etc.,program data 814, and an operating system 816. By way of example,operating system 816 may include various versions of Microsoft Windows®,Apple Macintosh®, and/or Linux operating systems, a variety ofcommercially-available UNIX® or UNIX-like operating systems (includingwithout limitation the variety of GNU/Linux operating systems, theGoogle Chrome® OS, and the like) and/or mobile operating systems such asiOS, Windows® Phone, Android® OS, BlackBerry® OS, Palm® OS operatingsystems, and others.

Computer-readable storage media 822 may store programming and dataconstructs that provide the functionality of some examples.Computer-readable media 822 may provide storage of computer-readableinstructions, data structures, program modules, and other data forcomputer system 800. Software (programs, code modules, instructions)that, when executed by processing subsystem 804 provides thefunctionality described above, may be stored in storage subsystem 818.By way of example, computer-readable storage media 822 may includenon-volatile memory such as a hard disk drive, a magnetic disk drive, anoptical disk drive such as a CD ROM, DVD, a Blu-Ray® disk, or otheroptical media. Computer-readable storage media 822 may include, but isnot limited to, Zip® drives, flash memory cards, universal serial bus(USB) flash drives, secure digital (SD) cards, DVD disks, digital videotape, and the like. Computer-readable storage media 822 may alsoinclude, solid-state drives (SSD) based on non-volatile memory such asflash-memory based SSDs, enterprise flash drives, solid state ROM, andthe like, SSDs based on volatile memory such as solid state RAM, dynamicRAM, static RAM, DRAM-based SSDs, magneto resistive RAM (MRAM) SSDs, andhybrid SSDs that use a combination of DRAM and flash memory based SSDs.

In certain examples, storage subsystem 818 may also include acomputer-readable storage media reader 820 that may further be connectedto computer-readable storage media 822. Reader 820 may receive and beconfigured to read data from a memory device such as a disk, a flashdrive, etc.

In certain examples, computer system 800 may support virtualizationtechnologies, including but not limited to virtualization of processingand memory resources. For example, computer system 800 may providesupport for executing one or more virtual machines. In certain examples,computer system 800 may execute a program such as a hypervisor thatfacilitated the configuring and managing of the virtual machines. Eachvirtual machine may be allocated memory, compute (e.g., processors,cores), I/O, and networking resources. Each virtual machine generallyruns independently of the other virtual machines. A virtual machinetypically runs its own operating system, which may be the same as ordifferent from the operating systems executed by other virtual machinesexecuted by computer system 800. Accordingly, multiple operating systemsmay potentially be run concurrently by computer system 800.

Communications subsystem 824 provides an interface to other computersystems and networks. Communications subsystem 824 serves as aninterface for receiving data from and transmitting data to other systemsfrom computer system 800. For example, communications subsystem 824 mayenable computer system 800 to establish a communication channel to oneor more client devices via the Internet for receiving and sendinginformation from and to the client devices. For example, when computersystem 800 is used to implement bot system 120 depicted in FIG. 1 , thecommunication subsystem may be used to communicate with a chatbot systemselected for an application.

Communication subsystem 824 may support both wired and/or wirelesscommunication protocols. In certain examples, communications subsystem824 may include radio frequency (RF) transceiver components foraccessing wireless voice and/or data networks (e.g., using cellulartelephone technology, advanced data network technology, such as 3G, 4Gor EDGE (enhanced data rates for global evolution), Wi-Fi (IEEE 802.XXfamily standards, or other mobile communication technologies, or anycombination thereof), global positioning system (GPS) receivercomponents, and/or other components. In some examples, communicationssubsystem 824 may provide wired network connectivity (e.g., Ethernet) inaddition to or instead of a wireless interface.

Communication subsystem 824 may receive and transmit data in variousforms. In some examples, in addition to other forms, communicationssubsystem 824 may receive input communications in the form of structuredand/or unstructured data feeds 826, event streams 828, event updates830, and the like. For example, communications subsystem 824 may beconfigured to receive (or send) data feeds 826 in real-time from usersof social media networks and/or other communication services such asTwitter® feeds, Facebook® updates, web feeds such as Rich Site Summary(RSS) feeds, and/or real-time updates from one or more third partyinformation sources.

In certain examples, communications subsystem 824 may be configured toreceive data in the form of continuous data streams, which may includeevent streams 828 of real-time events and/or event updates 830, that maybe continuous or unbounded in nature with no explicit end. Examples ofapplications that generate continuous data may include, for example,sensor data applications, financial tickers, network performancemeasuring tools (e.g., network monitoring and traffic managementapplications), clickstream analysis tools, automobile trafficmonitoring, and the like.

Communications subsystem 824 may also be configured to communicate datafrom computer system 800 to other computer systems or networks. The datamay be communicated in various different forms such as structured and/orunstructured data feeds 826, event streams 828, event updates 830, andthe like to one or more databases that may be in communication with oneor more streaming data source computers coupled to computer system 800.

Computer system 800 may be one of various types, including a handheldportable device (e.g., an iPhone® cellular phone, an iPad® computingtablet, a PDA), a wearable device (e.g., a Google Glass® head mounteddisplay), a personal computer, a workstation, a mainframe, a kiosk, aserver rack, or any other data processing system. Due to theever-changing nature of computers and networks, the description ofcomputer system 800 depicted in FIG. 8 is intended only as a specificexample. Many other configurations having more or fewer components thanthe system depicted in FIG. 8 are possible. Based on the disclosure andteachings provided herein, it should be appreciated there are other waysand/or methods to implement the various examples.

Although specific examples have been described, various modifications,alterations, alternative constructions, and equivalents are possible.Examples are not restricted to operation within certain specific dataprocessing environments but are free to operate within a plurality ofdata processing environments. Additionally, although certain exampleshave been described using a particular series of transactions and steps,it should be apparent to those skilled in the art that this is notintended to be limiting. Although some flowcharts describe operations asa sequential process, many of the operations may be performed inparallel or concurrently. In addition, the order of the operations maybe rearranged. A process may have additional steps not included in thefigure. Various features and aspects of the above-described examples maybe used individually or jointly.

Further, while certain examples have been described using a particularcombination of hardware and software, it should be recognized that othercombinations of hardware and software are also possible. Certainexamples may be implemented only in hardware, or only in software, orusing combinations thereof. The various processes described herein maybe implemented on the same processor or different processors in anycombination.

Where devices, systems, components or modules are described as beingconfigured to perform certain operations or functions, suchconfiguration may be accomplished, for example, by designing electroniccircuits to perform the operation, by programming programmableelectronic circuits (such as microprocessors) to perform the operationsuch as by executing computer instructions or code, or processors orcores programmed to execute code or instructions stored on anon-transitory memory medium, or any combination thereof. Processes maycommunicate using a variety of techniques including but not limited toconventional techniques for inter-process communications, and differentpairs of processes may use different techniques, or the same pair ofprocesses may use different techniques at different times.

Specific details are given in this disclosure to provide a thoroughunderstanding of the examples. However, examples may be practicedwithout these specific details. For example, well-known circuits,processes, algorithms, structures, and techniques have been shownwithout unnecessary detail in order to avoid obscuring the examples.This description provides example examples only, and is not intended tolimit the scope, applicability, or configuration of other examples.Rather, the preceding description of the examples will provide thoseskilled in the art with an enabling description for implementing variousexamples. Various changes may be made in the function and arrangement ofelements.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that additions, subtractions, deletions, and other modificationsand changes may be made thereunto without departing from the broaderspirit and scope as set forth in the claims. Thus, although specificexamples have been described, these are not intended to be limiting.Various modifications and equivalents are within the scope of thefollowing claims.

In the foregoing specification, aspects of the disclosure are describedwith reference to specific examples thereof, but those skilled in theart will recognize that the disclosure is not limited thereto. Variousfeatures and aspects of the above-described disclosure may be usedindividually or jointly. Further, examples may be utilized in any numberof environments and applications beyond those described herein withoutdeparting from the broader spirit and scope of the specification. Thespecification and drawings are, accordingly, to be regarded asillustrative rather than restrictive.

In the foregoing description, for the purposes of illustration, methodswere described in a particular order. It should be appreciated that inalternate examples, the methods may be performed in a different orderthan that described. It should also be appreciated that the methodsdescribed above may be performed by hardware components or may beembodied in sequences of machine-executable instructions, which may beused to cause a machine, such as a general-purpose or special-purposeprocessor or logic circuits programmed with the instructions to performthe methods. These machine-executable instructions may be stored on oneor more machine readable mediums, such as CD-ROMs or other type ofoptical disks, floppy diskettes, ROMs, RAMs, EPROMs, EEPROMs, magneticor optical cards, flash memory, or other types of machine-readablemediums suitable for storing electronic instructions. Alternatively, themethods may be performed by a combination of hardware and software.

Where components are described as being configured to perform certainoperations, such configuration may be accomplished, for example, bydesigning electronic circuits or other hardware to perform theoperation, by programming programmable electronic circuits (e.g.,microprocessors, or other suitable electronic circuits) to perform theoperation, or any combination thereof.

While illustrative examples of the application have been described indetail herein, it is to be understood that the inventive concepts may beotherwise variously embodied and employed, and that the appended claimsare intended to be construed to include such variations, except aslimited by the prior art.

What is claimed is:
 1. A computer-implemented method comprising: accessing original training data, the original training data including a plurality of utterances and a plurality of logical forms, each logical form of the plurality of logical forms corresponding to at least one utterance of the plurality of utterances; generating a set of abstract syntax trees for the plurality of logical forms; generating a set of delexicalized logical forms, each delexicalized logical form of the plurality of delexicalized logical forms being a delexicalized version of a logical form of the plurality of logical forms; generating a set of lexicalized logical forms by lexicalizing the set of delexicalized logical forms; generating, by a tree-to-string model, synthetic training data comprising an utterance for each lexicalized logical form of the set of lexicalized logical forms; and training a machine learning model with the original training data and the synthetic training data to translate an utterance to a logical form.
 2. The computer-implemented method of claim 1, wherein the set of abstract syntax trees are generated for the plurality of logical forms by parsing each logical form of the plurality of logical forms into an abstract syntax tree and normalizing the respective abstract syntax tree.
 3. The computer-implemented method of claim 1, wherein at least one delexicalized logical form of the set of delexicalized forms in generated automatically using a machine-learning model.
 4. The computer-implemented method of claim 1, wherein generating the set of lexicalized logical forms comprises analyzing each delexicalized logical form of the set of delexicalized logical forms to identify one or more constraints in the respective delexicalized logical form and sampling components of a database based on the identified one or more constraints in respective delexicalized logical form.
 5. The computer-implemented method of claim 1, wherein the generating the synthetic training data comprises translating, by the tree-to-string model, each lexicalized logical form of the set of lexicalized logical forms into an utterance.
 6. The computer-implemented method of claim 1, wherein the generating the synthetic training data comprises reordering each abstract syntax tree of the set of abstract syntax trees and decoding each of the reordered abstract syntax trees into an utterance.
 7. The computer-implemented method of claim 1, further comprising: accessing an utterance; inputting the utterance into the trained machine learning model; translating, using the trained machine learning model, the utterance into a logical form; executing the logical form as a query on a database to retrieve a result for the query; and outputting the result for the utterance.
 8. A system comprising: one or more processors; and one or more non-transitory computer-readable media storing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising: accessing original training data, the original training data including a plurality of utterances and a plurality of logical forms, each logical form of the plurality of logical forms corresponding to at least one utterance of the plurality of utterances; generating a set of abstract syntax trees for the plurality of logical forms; generating a set of delexicalized logical forms, each delexicalized logical form of the plurality of delexicalized logical forms being a delexicalized version of a logical form of the plurality of logical forms; generating a set of lexicalized logical forms by lexicalizing the set of delexicalized logical forms; generating, by a tree-to-string model, synthetic training data comprising an utterance for each lexicalized logical form of the set of lexicalized logical forms; and training a machine learning model with the original training data and the synthetic training data to translate an utterance to a logical form.
 9. The system of claim 8, wherein the set of abstract syntax trees are generated for the plurality of logical forms by parsing each logical form of the plurality of logical forms into an abstract syntax tree and normalizing the respective abstract syntax tree.
 10. The system of claim 8, wherein at least one delexicalized logical form of the set of delexicalized forms in generated automatically using a machine-learning model.
 11. The system of claim 8, wherein generating the set of lexicalized logical forms comprises analyzing each delexicalized logical form of the set of delexicalized logical forms to identify one or more constraints in the respective delexicalized logical form and sampling components of a database based on the identified one or more constraints in respective delexicalized logical form.
 12. The system of claim 8, wherein the generating the synthetic training data comprises translating, by the tree-to-string model, each lexicalized logical form of the set of lexicalized logical forms into an utterance.
 13. The system of claim 8, wherein the generating the synthetic training data comprises reordering each abstract syntax tree of the set of abstract syntax trees and decoding each of the reordered abstract syntax trees into an utterance.
 14. The system of claim 8, the operations further comprising: accessing an utterance; inputting the utterance into the trained machine learning model; translating, using the trained machine learning model, the utterance into a logical form; executing the logical form as a query on a database to retrieve a result for the query; and outputting the result for the utterance.
 15. A computer-program product tangibly embodied in one or more non-transitory machine-readable media, including instructions configured to cause one or more data processors to perform the following operations: accessing original training data, the original training data including a plurality of utterances and a plurality of logical forms, each logical form of the plurality of logical forms corresponding to at least one utterance of the plurality of utterances; generating a set of abstract syntax trees for the plurality of logical forms; generating a set of delexicalized logical forms, each delexicalized logical form of the plurality of delexicalized logical forms being a delexicalized version of a logical form of the plurality of logical forms; generating a set of lexicalized logical forms by lexicalizing the set of delexicalized logical forms; generating, by a tree-to-string model, synthetic training data comprising an utterance for each lexicalized logical form of the set of lexicalized logical forms; and training a machine learning model with the original training data and the synthetic training data to translate an utterance to a logical form.
 16. The computer-program product of claim 15, wherein the set of abstract syntax trees are generated for the plurality of logical forms by parsing each logical form of the plurality of logical forms into an abstract syntax tree and normalizing the respective abstract syntax tree.
 17. The computer-program product of claim 15, wherein at least one delexicalized logical form of the set of delexicalized forms in generated automatically using a machine-learning model.
 18. The computer-program product of claim 15, wherein generating the set of lexicalized logical forms comprises analyzing each delexicalized logical form of the set of delexicalized logical forms to identify one or more constraints in the respective delexicalized logical form and sampling components of a database based on the identified one or more constraints in respective delexicalized logical form.
 19. The computer-program product of claim 15, wherein the generating the synthetic training data comprises translating, by the tree-to-string model, each lexicalized logical form of the set of lexicalized logical forms into an utterance.
 20. The computer-program product of claim 15, wherein the generating the synthetic training data comprises reordering each abstract syntax tree of the set of abstract syntax trees and decoding each of the reordered abstract syntax trees into an utterance. 